Counting exactly the number of distinct elements: sorted arrays vs. hash sets?

Suppose that you have ever larger sets of 64-bit integers, and you want to quickly find out how many distinct integers there are. So given {10, 12, 10, 16}, you want an algorithm to output 3, as there are three distinct integers in the set. I choose 64-bit integers, but strings would do fine as well.

There are sensible algorithms to estimate this number, but you want an exact count.

Though there are many good ways to solve this problem, most programmers would first attempt to use one of these two techniques:

  • Create a hash set. Throw all the values in the hash set (implemented with a hash table). Then check how many values are found in the hash set in the end. In C++, you might implement it as such:
    size_t distinct_count_hash(const uint64_t * values, size_t howmany) {
      std::unordered_set<uint64_t> hash(values, values + howmany);
      return hash.size();
  • Put all the values in an array, sort the array then run through it, deduplicating the values. In C++, you might implement it as follows:
    size_t distinct_count_sort(const uint64_t * values, size_t howmany) {
      std::vector<uint64_t> array(values, values + howmany);
      std::sort(array.begin(), array.end());
      return std::unique(array.begin(), array.end()) - array.begin();

Which is best? Sorting has complexity O(n log n) whereas insertion in a hash set has expected constant time O(1). That would seem to predict that the hash set approach would always be best.

However, there are many hidden assumptions behind textbook naive big-O analysis, as is typical. So we should be careful.

Simple engineering considerations do ensure that as long as the number of distinct elements is small (say no larger than some fixed constant), then the hash set approach has to be best. Indeed, sorting and copying a large array with lots of repeated elements is clearly wasteful. There is no need for fancy mathematics to understand that scenario.

But that’s not the difficult problem that will give you engineering nightmares. The nasty problem is the one where the number of distinct elements can grow large. In that case, both the array and the hash set can become large.

Which is best in that difficult case? I wrote a small C++ benchmark which you can run yourself.

Nhash set (cycles/value)array sort (cycles/value)

So when there are many distinct values to be counted, sorting an array is an efficient approach whereas the hash table should be avoided.

How can we understand this problem? One issue is that as the hash table becomes large, it comes to reside in RAM (as it no longer fits in CPU cache). Because of how hash sets work, each operation risks incurring an expensive cache miss. A single retrieval from RAM can take dozens of CPU cycles. Meanwhile, sorting and scanning an array can be done while avoiding most cache misses. It may involve many more operations, but avoiding cache misses can be worth it.

What if I kept cranking up the data size (N)? Would the hash set ever catch up? It might not.

The problem is the underlying assumption that you can access all memory using a constant time. That’s not even close to true.

Science and Technology links (May 18th, 2017)

Google has announced at its annual conference (I/O 2017) that it has computing pods capable of 11.5 petaflops. They are made of 64 customized TPU (processors specialized for deep learning/AI), each generate 180 teraflops. It is going to be available to other companies via Google cloud. Google has also announced, a new website where Google presents its work on AI. It seems likely that Google wants to sell AI as a service. Interestingly, 11.5 petaflops is Ray Kurweil‘s estimate regarding the computing power needed to simulate the human brain. There are supercomputers exceeding 10 petaflops right now, like Sunway TaihuLight, but it takes a whole room to host them whereas Google’s computing pods look to be the size of a server rack. And, of course, you and I cannot have access to China’s Sunway TaihuLight whereas, for the right price, Google gives us access to its computing pods. So is Google capable of emulating the human brain yet? Some less optimistic people have estimated that the computing power of the human brain is around 1 exaflop. So take 100 computing pods at 11.5 petaflops each and, according to some, you have the computing power of the human brain. Of course, we do not really know. Maybe 1 pod is enough to match a brain, or maybe we would need thousands. However, it looks like Google is within striking distance of matching the human brain in raw computing power with a single rack of computers. I should add that not all petaflops are comparable: Google has designed specialized hardware. Their computing pods may not be great a simulating the weather, for example.

What if you are not Google and want to build your own computing pods? Nvidia just announced its Nvidia Volta GPU. It has 120 teraflops. That’s a lot less than Google’s TPUs, but Nvidia GPUs are available to the public. With 85 Nvidia Volta GPUs, you are hitting 10 petaflops and you too, according to some, is within striking distance of the computing capabilities of the brain. I don’t know how much Nvidia will charge of its GPUs but a reasonable estimate might be $1000 (the actual price might be less). So for $100,000, you can build your own computing pod.

How does that compare to Intel chips that we all rely upon? A lot of Intel chips deploy about 100 gigaflops, or 0.1 teraflops. So to get to 10 petaflops, you’d need 100,000 chips. Not very practical.

I should qualify, once more, these numbers: counting the number of flops is just a way to get a rough estimate of the possibilities behind the hardware. It is quite possible to have 100 theoretical petaflops, but be unable to make good use out of them.

By the way, next time you are in a bar, here is a great pick-up line: “You must have a lot of petaflops”. You read it here first.

Google rolled up its smart reply feature. Way ahead of you Google! My students have been getting only one of three answers for last few years: “That’s great”, “Can you rephrase that?”, “Don’t worry about it”.

Scientists have figured out how to create stem cells that can regenerate blood cells:

When the researchers injected these stem cells into mice that had been treated with radiation to kill most of their blood and immune cells, the animals recovered. The stem cells regenerated the blood, including immune cells, and the mice went on to live a full life—more than 1.5 years in the lab.

Half of all jobs will be replaced by Artificial Intelligence in 10 years according to Kai-Fu Lee. He is a smart and important man. He could still be very wrong. He makes some great points such that China rose tremendously in artificial intelligence, starting from nothing and rising up to levels comparable to the US in a few years.

Facebook’s CEO wants to cure all diseases by the end of the century. To this end, they are giving grant money to researchers. There is a condition though:

The CZI (…) ask that all code developed in support of CZI-funded studies be published on public code repositories such as GitHub.

I have mixed feelings about this. Forcing people to publish their code does not necessarily have the expected effects. I have had colleagues who trumpeted that their research software was “open source”. And, sure enough, you could download the software online. As for building upon it or using it? Well. Good luck with that. You can’t mandate culture, you can only hack it.

Researchers describe in a Nature article how they were able to restore ovarian function in sterilized mice using 3D printing.

Tony Seba, an economist at Stanford University, predicts that fossil-fuel vehicles will disappear in 8 years, according to reports. They will be replaced by electrical self-driving vehicles.

Determining the truth is hard:

I show that fact-checkers rarely fact-check the same statement, and when they do, there is little agreement in their ratings.

Reminds me of science.

Macular degeneration is a terrible disease where older people become progressively blind. It looks like it has to do with eating too much sugar:

Changing the diet to a low-glycemic-index diet, even late in life, arrested the development of AMD, (…)

As an aside, it also looks like Type 2 diabetes and obesity are largely preventable through diet and exercise. Sadly, this is not enough to make these problems easy ones.

Scientists have bioengineered a synthetic pancreas that was transplanted into a patient. It cured its diabetes. Could it be that we are about to cure diabetes for good?

According to CNBC, Apple has engineers working on better ways to monitor blood sugar. Even though I am not diabetic (to my knowledge), I would love an Apple watch that monitors my blood glucose.

The initiative is far enough along that Apple has been conducting feasibility trials at clinical sites across the Bay Area and has hired consultants to help it figure out the regulatory pathways, the people said. One person said about 30 people were working in this group as of a year ago. But speculation has been flying around since the company snapped up about a dozen biomedical experts from companies like Vital Connect, Masimo, Sano, Medtronic and C8 Medisensors. Some of these people joined the secretive team dedicated to glucose, sources said, while others are on Apple Watch team. One of the people said that Apple is developing optical sensors, which involves shining a light through the skin to measure indications of glucose.

The world faced a major cyber attack where attackers captured computers and asked for compensation to release them. The virus was called WannaCry and affected solely the Windows operating system. The English health organizations were hit hard. The attack was reportedly diffused by a 22-year-old dropout who found a hidden kill switch in the ransomware virus.

I love the power of simple mathematics! On this note, Maurer provides us with a nice analysis with respect to population growth. Currently, the world can be roughly divided in two. There are countries with low fertility but high longevity (e.g., Japan) and countries with high fertility and short lives (e.g., many countries in Africa). Which are more likely to exhibit overpopulation in the future? As an empirical observation, we should point out that many countries with high longevity, like Japan and Germany, are actually undergoing “de-population” in the sense that their population is falling. But what is the math telling us?

Assume an initial population of 1000 people. The fertility rate is 2, and the life expectancy is 80. Women give birth at 20. Now, let us consider two variations:

Case A: Death disappears. Nobody dies anymore!

Case B: The fertility rate slightly increases from 2 to 2.5.

Which of these two cases will lead to the greater population increase?

– After 1000 years, the population will be 51 000 in case A, and at least 206 000 000 in case B: more than 4000 times case A! The gap will be enormous.

The conclusion is clear: if you are worried at all about overpopulation, you must be concerned about fertility. And we know, empirically, that fertility falls when women have well-paid jobs, education, contraceptives, and freedom. The solution is clear. We must opt for prevention of the diseases of old age (to diminish the burden of care, mostly affecting women) and ensure that women are well educated, free and well remunerated (so that they have low fertility). This simple strategy alone is very likely to prevent overpopulation and it has the side benefits of making people (starting with women) better off.

Educational backgrounds of the CEOs of the top corporations in the US

  • Apple is the most valuable company in the US. The CEO is Tim Cook who has a bachelor of science in industrial engineering from Auburn University. The chairman is Arthur D. Levinson who has a degree in molecular biology from the University of Washington.
  • Alphabet (Google’s parent company) is the second most valuable company. The CEO is Larry Page who has a bachelor of science in computer engineering from the University of Michigan. The CEO of Google itself is Sundar Pichai who has a degree in metallurgical engineering from Indian Institute of Technology Kharagpur. The chairman is Eric Schmidt who has a degree in electrical engineering from Princeton. Eric Schmidt wrote software that many of us rely upon to this day.
  • Microsoft’s CEO is Satya Nadella. He has a bachelor degree in electrical engineering from the Manipal Institute of Technology.
  • Amazon’s CEO is Jeff Bezos, he has bachelor of science degrees in electrical engineering and computer science from Princeton.
  • ExxonMobil’s CEO is Darren Woods who has bachelor degree in electrical engineering from Texas A&M University.
  • Johnson & Johnson’s CEO is Alex Gorsky who has a bachelor of science degree from the U.S. Military Academy at West Point.
  • Facebook’s CEO is Mark Zuckerberg, a college drop-out who was studying computer science at Harvard.

Of all degrees granted in 2014-2015 in the US, about 7% were in engineering or computer science, about as much as were granted in psychology. By far the most popular field of study is business (20% of all degrees).

Has the Internet killed real estate agents yet?

Back in 2002 when I was first interested in buying a house, I went on the Internet and found lots of houses for sale, directly from the sellers. I bought the house I own right now directly from the seller. At the time, I was convinced that the days of real estate agents were counted. I remember telling a friend who wanted to go into real estate that the Internet would soon kill this industry.

It made sense. Idiots like myself could buy houses from other idiots, that is people without any training in real estate, without any other intermediary than the Internet. How long could the real estate agents last?

Real estate agents don’t inspect houses, they do not have power of attorney, they do not provide deeds, they do not provide the financing, they do not provide the insurance. Real estate agents may take the pictures and post them on the Internet, but iPhones take decent pictures.

A home inspection (not covered by the agent’s fees) might cost you $300. A lawyer will charge you a flat fee to represent you in the transaction (maybe $1000, not covered by the agent’s fees). The bulk of the transaction costs are taken up by the real estate agent.

Yet real estate agents are still with us, charging 5% in commission. That’s a sweet deal: sell a single home and you can charge half of what many people make in half a year.

It hurts my ego to admit that I was badly wrong: the Internet has not affected real estate agents in the least.

You’d think people would be eager to keep the commission fee for themselves (it is tens of thousands of dollars!). The Washington Post tell us that nothing of the sort is happening:

And over the past decade, the Internet has disrupted almost every aspect of a transaction that sits at the core of the American Dream. Everyone now has free access to information that used to be impossible to find or required an agent’s help. But as a new home-buying season kicks off, one thing remains mostly unchanged: the traditional 5-to-6-percent commission paid to real estate agents when a home sells. While the Internet has pummeled the middlemen in many industries — decimating travel agents, stomping stock-trading fees, cracking open the heavily regulated taxi industry — the average commission paid to real estate agents has gone up slightly since 2005, according to Real Trends. In 2016, it stood at 5.12 percent. “There’s not a shred of evidence that the Internet is having an impact,” Murray said, sounding like he almost can’t believe it himself.

The article argues that the sale of a home is a complicated transaction. Oh! Come on! That’s a pathetic explanation: planning a trip abroad is complicated and, yet, we have no qualm doing away with travel agents and using the Internet instead. Of course, the transaction cost is higher which makes it worthwhile to pay someone to help. But 5% of the transaction is lot. In Canada, that’s about $25,000 to sell a single house (5% of $500,000): the price of a brand new car. And selling and buying houses is really not that complicated. It is not $25,000-complicated.

Recall that real estate agents do not provide home inspection, insurance, financing, legal titles… all of these things are separate expenses provided by separate people.

Whether real estate agents have expenses, and how much of the 5% they pocket is irrelevant. The fact is that this 5% has remained the same for decades. This means that, in real dollars, real estate agents cost the same today as they did decades ago.

To put it another way, the productivity of real estate agents has, if anything, decreased in recent decades despite all the technological progress. In comparison, all industries confounded, productivity grows by about 1% a year. That’s why Americans, on average, are much richer than they were decades ago.

On average, workers are at least 20% more productive today than they were 20 years ago. But not real estate agents.

Another way to describe a stagnation or decline in productivity is to say that real estate agents, despite all their new tools, are not getting any better over time, and are probably getting slightly worse since their cost is rising.

They have cheap mobile phones, the Internet, databases, fancy software… all of that has not, in the least, made them more productive.

How well do the real estate agents serve the interest of their clients? Maybe not so well:

Those selling without an estate agent were more satisfied and the gap between sales price and asking price was smaller than for those selling through a real estate broker. (Stamsø, 2015)

Our central finding is that, when listings are not tied to brokerage services, a seller’s use of a broker reduces the selling price of the typical home by 5.9% to 7.7%, which indicates that agency costs exceed the advantages of brokers’ knowledge and expertise by a wide margin. (Bernheim and Meer, 2012)

Many real estate agent recommend that sellers lower their prices (thus making their job much easier) on the belief that buyers are going to bid on the house. Yet this is a terrible strategy for their clients:

While the (…) recommendations of real estate agents (…) favor underpricing, alluding to a potential herding effect, our market data do not provide any support for this strategy. (Bucchianeri and Minson, 2013)

Can you do better with a cheap, flat-fee broker? It seems you can:

Brokers with a flat-fee structure who charge an up-front fee (which is substantially lower than the average fee of traditional brokers) and leave the viewings to the seller sell faster and at – on average – 2.7 percent higher prices. (Gautier et al. 2017)

So knowing all this… why hasn’t the Internet at least forced the real estate agents to lower their commission fees? If Uber was able to break the cab driver’s back, why can’t we come up with the equivalent for real estate?

I have nothing against real estate agents, I am just curious. And please, don’t tell me it is the “human element”. People don’t go around hugging their real estate agents, not any more than they hugged their travel agents.

Update: A comment by Panos Ipeirotis suggests that travel booking sites also charge a large percentage (15%-20%) on hotel reservations while AirBnB charges 6% to 12%. This would mean that real estate agents might not be such outliers. I went looking for signs that travel agents had disappeared and it seems that there are still many of them, though their work was transformed over time. This makes me question the belief that “the Internet has pummeled the middlemen in many industries” as stated in the Washington Post.

My review of Change Agent: A Novel (by Daniel Suarez)

Change Agent is a sci-fi novel still hot from the presses. It set in our near future (2049).

The genre has been captured by writers who love dystopian futures. Suarez can’t quite distance himself from this trend. We are in for massive climate changes and millions of climate refugees. We have gene therapies that are owned and operated by organized crime, complete with children being experimented and discarded. It is not a pleasant world.

However, there was enough in this novel to keep me interested. Some interesting bits:

  • The novel is clearly inspired by George Church work. In particular, Church’s book Regenesis is cited. The general thesis is that genetic engineering will soon supplant computers and electronics.
  • The novel pretty much embraces everything from Church’s work, save for, apparently, the possibility that we might stop the aging process. That makes for some intriguing effects. For example, in the novel, you can change someone’s genes dynamically and this results in a new appearance. However, even after changing all of your genes, and having your organs reconfigured, you somehow remain the same age.
  • The novel adopts the idea that once we can manipulate genes, parents will be eager to twice the genes of their embryos, going as far as dealing with organized crime to get the desired results. This seems very unlikely. Parents do not tend to be eager to take risks with their children.
  • It seems that intelligence can be greatly improved using genetic updates. This seems unlikely, at least by 2049.
  • The center of the world is no longer the Silicon Valley but rather Singapore. American engineers were prevented from participating in the bio-engineering revolution, but the Singapore government had no qualm about embracing the new technology.
  • Self-driving cars are ubiquitous and you can easily rent one. However, it is easy for troublemakers to stand in front of your rented self-driving car to prevent you from moving.
  • Augmented reality is ubiquitous. Some interesting applications are discussed such as the possibility that you might browse a pharmacy without having to read the tiny prints.
  • Artificial intelligence is everywhere.

I think it is a fairly realistic depiction of a possible near-term future.

Science and Technology links (May 12th, 2017)

The Apple watch can be used to diagnose heart disease automatically. This is not marketing talk, but hard research. And, of course, there is no reason for this kind of work to be limited to Apple products. In the near future, many of us, beyond a certain age, will wear devices monitoring our health. It only makes sense.

Nvidia released new processors based on the Volta architecture. The new V100 card is made of 21 billion transistors and has hundreds of cores dedicated to deep learning (called tensor cores). It seems that Nvidia has no problem bumping up the performance of its chips year after year.

Could the industrial revolution have arisen in France? Howes points out that French scientists were said to be rather more concerned with abstract theorising than with applying their knowledge.

Apple is the first company to be worth $800 billion. Its main product is the iPhone, a product that did not exist 10 years ago.

Amazon is selling a $230 device (Echo Show) that you can talk to, use as a touchscreen and use to make video calls. Yes. $230.

Published in Nature: A chronic low dose of tetrahydrocannabinol (THC) restores cognitive function in old mice. In other words, cannabis could rejuvenate your brain. Now that cannabis is being decriminalized in North America, we might get to learn a lot more about it is medical uses.

Kinsey reflects on the fact that current machine learning techniques require large budgets:

 many of the papers deemed most significant relied on massive compute resource that is usually unavailable to academics.

I would add that access to the data is also limited outside large corporations like Google. The fact that progress depends on large corporations is hardly new of course. It is not like academics can design new computer chips to compete with Intel.

North American Robotics Market Surges 32 Percent.

According to Nature: A fully organic retinal prosthesis restores vision in a rat model of degenerative blindness.

Again in Nature: Generation of inner ear organoids containing functional hair cells from human pluripotent stem cells.

Some researchers claimed to have figured out what causes our hair to turn gray with age: our findings reveal the identities of hair matrix progenitors that regulate hair growth and pigmentation. The interesting part of the story is that we don’t know what causes hair to become gray, though it is assuredly the result of uncontrolled oxidation. Whether these researchers have cracked the problem is another story.

Signed integer division by a power of two can be expensive!

Remember when you learned the long division algorithm in school? It was painful, right?

It turns out that even on modern processors, divisions are expensive. So optimizing compilers try to avoid them whenever possible. An easy case is the division by a power of two. If a compiler sees x / 2 when x is an unsigned integer then it knows that it can simply “shift” the bits of the variable x by 1 because data is represented in binary. A shift is a relatively inexpensive operation: it completes in a single CPU cycle on most processors…

Some programmers cannot resist and they will write x >> 1 instead of x / 2. That’s mostly wasted time, however. Optimizing compilers can be counted to be smart enough to figure it out.

What if x is a signed integer? Sadly, a simple shift is no longer sufficient and several instructions must be used. Most compilers seem to generate about 4 to 5 instructions. Some versions of the Intel compiler generate three separate shifts.

This means that, some of the time, if we use x >> 1 (or x >>> 1 in Java) instead of x / 2, we might get a different performance even if the actual value stored in x is positive.

We might think that it is no big deal. It is still going to be super fast, right?

Let us consider a generally useful algorithm: the binary search. It finds the location of a value in a sorted array. At the core of the algorithm is the need to divide a length by two repeatedly. Looking at Java’s source code, we find how the Java engineers implemented the binary search in the standard library:

public static int BinarySearch(int[] array, int ikey) {
    int low = 0;
    int high = array.length - 1;
    while (low <= high) {
        final int middleIndex = (low + high) >>> 1;
        final int middleValue = array[middleIndex];

        if (middleValue < ikey)
            low = middleIndex + 1;
        else if (middleValue > ikey)
            high = middleIndex - 1;
            return middleIndex;
    return -(low + 1);

Notice the shift? Let us rewrite the function with an integer division instead:

public static int BinarySearch(int[] array, int ikey) {
    int low = 0;
    int high = array.length - 1;
    while (low <= high) {
        final int middleIndex = (low + high) / 2;
        final int middleValue = array[middleIndex];

        if (middleValue < ikey)
            low = middleIndex + 1;
        else if (middleValue > ikey)
            high = middleIndex - 1;
            return middleIndex;
    return -(low + 1);

It is nearly the same function. You may expect that it will have nearly the same performance.

Maybe surprisingly, that’s not true at all.

I wrote a little Java benchmark. It shows that the version with integer division runs at 2/3 the speed of the version with a shift. That’s a severe performance penalty.

The result is not specific to Java, it also holds in C.

The lesson?

When working with signed integer, do not assume that the compiler will turn divisions by powers of twos into code that nearly as efficiently as a single shift.

Credit: The observation is based on work by Owen Kaser.

Science and Technology links (May 5th, 2017)

Lungs make blood cells:

In experiments involving mice, the team found that they produce more than 10 million platelets (tiny blood cells) per hour, equating to the majority of platelets in the animals’ circulation.

Scientists have edited the genes of monkey embryos.

The American government funds medical research through the Share on facebook National Institutes of Health (NIH). Funding agencies tend to fund a few people very well, while most researchers struggle for funding. That’s a problem because science benefits when diverse avenues are explored. Meanwhile, throwing more money at an already rich laboratory does not improve outputs. By introducing a cap whereas no research can receive more than three times the funding of a regular research, they will be able to award 1600 new grants. Though the NIH did not comment on age, it is very likely that this is a generational transfer, the new grants will be handed out to younger researchers and the cap will tend to affect older researchers.

Google is accurately reading street names and business names from store fronts.

Applying these large models across our more than 80 billion Street View images requires a lot of computing power.

When exercising hard, you hit a “wall” where you have to stop due to exhaustion. This is caused by glucose depletion. Though your muscles can go on even after the glucose is depleted, burning fat instead, your brain cannot. Your brain must have glucose. Interestingly, this means that no matter how much of an athlete you are, at some point you will hit this wall because your muscles do use up glucose when it is available. Still, people in better shape go on for longer. How do they do it? By burning less glucose and more fat. You can get the desired result by training. Training is hard work. Fan et al. show that you can get the same greatly extended endurance without training at all, just by supplementing with something called PPARδ. It has been described as exercise in a pill. Speaking for myself, I am never running another mile in my life, I’ll wait for PPARδ pills.

You cannot drown in quicksand:

If you end up in quicksand, don’t panic. Quicksand is denser than a human, which means that, at the worst, you won’t sink in much further than your waist (…)

As you get older, you just do not learn as well as you used to. Eventually, your cognitive abilities decline. We used to think that this was caused by a depletion of brain cells in the cortex, or by some degradation of the neurons, but it is no longer so evident. Wu et al. propose a rather daring theory: that we could reshape our cognitive abilities as we age by imitating infants.

Although intellectual engagement is a significant factor associated with adult cognitive health, it is unclear what it includes, why and how it declines across the lifespan (…) This integrative review introduces a novel theoretical life course framework that synthesizes research on early childhood experiences and cognitive aging to address the following three points. First, we specify six critical factors of intellectual engagement for long-term, broad cognitive development: (a) open-minded input-driven learning, (b) individualized scaffolding, (c) growth mindset, (d) forgiving environment, (e) serious commitment to learning, and (f) learning multiple skills simultaneously. We show that these factors increase basic cognitive abilities (e.g., working memory, inhibition) and promote far transfer. Second, we trace the decline of the six factors from infancy to aging adulthood (broad learning to specialization). Finally, we propose that these six factors can be applied to expand cognitive functioning in aging adults beyond currently known limits.

The idea is that if you keeping trying to expand your mind, keep learning in different directions, allow yourself to remain open and to make mistakes… you might actually remain sharp. Or not. It is speculative.

It seems that many cancers are indeed the result of poor lifestyle choices:

unavoidable intrinsic risk factors [such as cell division] contribute only modestly, less than 10-30 percent, to the development of many common cancers

Arthritis is no fun. Lots of people suffer from joint pains and there is just about nothing that doctors can do. I know a lot of people close to me who have problems with their knees. New research suggests that removing senescent cells can help heal the joints. Senescent cells are bad cells that should die but rather remain around and accumulate with age. The good news is that we have commercial technology to safely removed senescent cells. Jeff Bezos (Amazon’s CEO) has invested in at least one company that is developing senolytic therapies. Meanwhile some other people are working on an arthritis “vaccine” based on stem cells.

Air-free pneumatic tires are coming.

General Motors CEO thinks that autonomous cars are going to come sooner than we think.

NASA has received the mandate to send us to Mars in the 2030s. They have published somewhat vague slides about their plans (PDF).

Science and Technology links (April 28th, 2017)

It is estimated that our species, homo sapiens, appeared in Africa as far back as 200,000 years ago, and that we left Africa about 60,000 years ago. Confusingly, scientists found a 130,000-year-old archaeological site in southern California. So, maybe, there were human beings in America tens of thousands of years before our species left Africa. Homo sapiens were hardly the first human beings outside of Africa. It is believed that there were human beings of various types in Asia as far back as 600,000 years ago. It seems that some of them made it to America. What happened to them? How do they relate with us? Is any of this even true?

Our bodies produce myostatin, a protein that suppresses muscle growth. Some of us are naturally more muscular because we produce less myostatin. Obese people produce more myostatin. What happens if you take obese mice and prevent them from producing myostatin? Turns out that they appear to be healthier.

We all imagine that our ancestors were big and muscular whereas our descendants will be nerdy. In fact, human beings could get quite a bit more muscular in the coming decades. If you are paying attention, the process is probably already under way. When I was a kid, popular actors had unremarkable muscles, except maybe for a few like Arnold Schwarzenegger… I don’t imagine that most retirees will ever pump iron, but future therapies could keep older people quite a bit more muscular.

Kevin Kelly has a nice post on the myth of artificial intelligence as a threat to human beings. His core argument, which he made in his book the Inevitable and elsewhere, is that the concept of “general intelligence” is bogus. The Google search engine has intelligence that no human being can match, your dog has a kind of intelligence that you cannot match… and so forth. So we are unlikely to emulate human intelligence in our technology, and much more likely to produce forms of intelligence that best complement our own.

According to Yahoo! Finance, U.S. soda sales have been declining for the last 12 years. The article hints that public policies might be to blame:

The consumption of added sugar in foods and beverages has been linked to obesity and type 2 diabetes. The World Health Organization, the U.S. Food and Drug Administration and the American Heart Association have all recommended reducing consumption of soda as a way to cut down on added sugars.

So sugar is bad. What about salt? Low-sodium diet might not lower blood pressure: Findings from large, 16-year study contradict sodium limits in Dietary Guidelines for American. In other words, while you should cut down on sugar, salt is probably fine.

The Canadian prime minister Trudeau relies on cupping, the practice of creating a void within a cup that lies on the skin “to suck out pain, disease, and tension from the body”. Apparently, it can “suck out flu”. The prime minister appears in pictures with his shirt pulled up to clearly show the cupping marks, and his office confirmed the practice.

Economists seem to agree that the share of income that goes to labor is lower than it was. It is somewhat of a problem because most of us derive most of our income from our labor. Bloomberg has an article on the question. What is happening?

(…) companies themselves aren’t substituting machines for workers, as we might expect them to do if robots were getting really cheap. Instead, the economy is simply shifting resources toward a few large companies that are very capital-intensive, and away from the more numerous, smaller companies that use more human labor.

It is certainly true that companies like Google, Amazon, and Apple use fewer employees than comparable companies would have used in the past (like General Motors). They also rely a lot more on software and automation.

As you grow older, your bones and joints go to hell. I see many people in their forties and even thirties who suffer from painful joints, have chronic knee problems and so forth. It is not uncommon for people in their sixties and seventies to fall and break their bones. In a recent Nature paper, scientists found evidence that senescent cells, these cells that ought to die but slowly accumulate in our bodies over time, might be a significant part of the problem. It seems that removing senescent cells could help halt bone and joint aging (speculative).

Senescent chondrocytes are found in cartilage tissue isolated from patients undergoing joint replacement surgery, yet their role in disease pathogenesis is unknown. (…) Selective removal of the senescent cells from in vitro cultures of chondrocytes isolated from patients undergoing total knee replacement decreased expression of senescent and inflammatory markers while also increasing expression of cartilage tissue extracellular matrix proteins.

Billionaire Jack Ma predicts that in 30 years…

the Time Magazine cover for the best CEO of the year very likely will be a robot. It remembers better than you, it counts faster than you, and it won’t be angry with competitors

The puzzling part of his prediction is that he expects that the Time magazine will still be around in 30 years, and that it will still have a cover. Odd that.

Google has, at great cost, indexed 25 million books. A great fraction of these books are simply not available commercially. In many cases, it is not even possible to determine who “owns” the right to these books. Google could, at the flip of a button, make them available for free for the entire planet, to everyone rich and poor. Yet because of “copyright”, all these precious books will remain forever locked. You will have to go to Harvard or some other expensive school, if you want to consult the original that Google scanned. Though Google will not tell us about it, I bet that this large collection of books can prove invaluable for training machine learning software. In the novel Rainbows End, entrepreneurs are madly scanning books to train advanced artificial-intelligence software. Google had already done the scanning part: if the data is of any use for training artificial intelligence, they’d be silly not to use it. Of course, you won’t have access to the same data because the courts won’t allow it. (Further reading: Do we need copyright?)

IBM, after Google, is opening an artificial-intelligence research laboratory to collaborate with professor Bengio at the University of Montreal. So maybe it is a good time to do artificial intelligence in Canada, and in Montreal specifically. (Source: Claude Coulombe)

Crab blood is worth up to $14,000 per quart. Crabs like lobsters, spiders, and snails have blue blood because they rely on copper and not iron to carry oxygen. There are instances of all these animals that are very long lived, with lobsters exhibiting negligible senescence [meaning that their fitness does not diminish with age]. (Source: Gregor J. Rothfuss, P.D. Magnan)

Nice overview of the work accomplished by the late Hans Rosling:

Rosling’s first discovery was that many people are not aware of even the most basic facts about global health and global development. (…) He found that people’s worldviews often do not have much grounding in facts, even long before the “post-fact” era.

Rosling was criticized, rightly so, for offering a positive outlook on the world. However, what his critics often missed is that he offered all of us hard data… which is a lot better than whatever we go on usually.

Seth Godin asks What does “science” mean?

Science isn’t something to believe or not believe. It’s something to do.

This is a very important point. Science is a process. A scientist is someone who follows a rigorous process to arrive at the truth. There is faith in involved, but it is faith in the process, not in the results.

Believing something because it is in a science textbook as opposed to the bible is not what science is all about. The textbook is likely more reliable from a scientific point of view, but if you leave your doubts aside, you are missing the point of science.

I find that this point is regularly missed entirely, even by people who have a PhD in science. Science is not a body of knowledge. It is a process. The body of knowledge is useful, it is scholarship. But in that, a science like Physics is no different from Theology. Both of them benefit from the accumulated knowledge of the past. What makes Physics into a science is what physicists do, not what they know.

Byrne et al. in How Fast are Semiconductor Prices Falling? reassure us that the price of microprocessors is still falling fast if you adjust for quality:

The results from our preferred hedonic price index indicate that quality-adjusted microprocessor units prices have continued to fall rapidly, contrary to the picture from the Producer Price Index. Our results are consistent with other indicators of continued rapid technical progress in the semiconductor sector. Concerns that the semiconductor sector had faded as an engine of growth over the period covered by our analysis appear to be unwarranted.

Using stem cells, scientists have found a way to effectively rejuvenate the blood cells of old mice.

Immunotherapy is all the rage in oncology. The idea is to tweak your immune system into fighting cancer. When it works, it works better than just about anything. Sadly, it only works in 1 person out of 5, and we don’t know what sets these lucky people apart:

About 22 percent of melanoma patients that get a single round of treatment with Yervoy are alive 10 years later, (…)

What you should probably remember is that we are not winning the war against cancer. Not yet.

The Economist remind us that the reason we have so many cars is hat we effectively subsidize them through, among other things, free parking:

With such a surfeit of parking, most of it free, it is little wonder that most people get around Silicon Valley by car, or that the area has such appalling traffic jams.

Quickly pruning elements in SIMD vectors using the simdprune library

Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions.

I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however.

Let us consider two of my recent blog posts Removing duplicates from lists quickly and How quickly can you remove spaces from a string? They follow the same pattern. We take a vector, identify the values that we want to remove, build a corresponding bit mask and then remove them. In one case, we want to remove repeated values, in another, we want to remove spaces.

Building the bit mask is efficient and takes only a few instructions:

  • We can identify the values to remove using vectorized comparisons (e.g., using the intrinsics _mm_cmpeq_epi8 or _mm256_cmpeq_epi32). This is typically very inexpensive.
  • We then build the bit mask from the comparison vector using what Intel calls a movemask (e.g., using the intrinsics _mm_movemask_epi8 or _mm256_movemask_ps). The movemask is relatively cheap, though it can have a high latency.

The pruning itself is ugly, and it requires a table lookup. I decided to publish a software library called simdprune to make it easier.

The library is quite simple. If you need to remove every other value in a 256-bit vector, you can get this result with the function call prune256_epi32(x,0b10101010).