- We might soon be able to buy memory cards with speeds nearing 4 GB/s. For comparison, an expensive and recent macBook currently has a disk with a 2 GB/s bandwidth. The PlayStation 5 should have a 5 GB/s bandwith.
- Human industry has boosted the amount of CO2 in the atmosphere. This has two predictible outcomes: slightly higher global temperature (CO2 has a mild green house effect) and higher plant productivity (CO2 acts as a fertilizer). The CO2 fertilization effect is strong: a 30% gain from 1900 in photosynthesis efficiency. Moreover, higher plant productivity translates into more CO2 capture and thus it tends to reduce the quantity of CO2. Haverd et al. report that we may have underestimated the carbon sink effect of CO2 fertilization.
- The successful e-commerce firm, Shopify, will allow most of its employees to work remotely in the future.
- Human beings may have hunted mammoths by chasing them into predetermined traps.
- There is a theory that sending your kids to a more selective school help them because being exposed to high achieving peers raises their level. But it seems that this peer effect is a myth. In other words, paying a lot of money to send your kids to an exclusive school is probably a waste. (It does not imply that sending your kids to a dysfunctional school is harmless.)
- We should apply with care the principle that extraordinary claims require extraordinary evidence. Indeed, this principle can be used to reject results that violate human consensus and slow the progress of science. Indeed, scientific progress is often characterized by a change in the consensus as we go from one realization to another.
- We can prevent age-related bone losses in mice by tweaking the content of their blood plasma.
- Many recent advances in artificial intelligence do not hold up to scrutiny. This is why you will often hear me dismiss novelty with the phrase “originality is overrated”.
Kolter says researchers are more motivated to produce a new algorithm and tweak it until it’s state-of-the-art than to tune an existing one. The latter can appear less novel, he notes, making it “much harder to get a paper from.”
The net result is that researchers tend to overrate novelty and originality. In practice, you often get better results by selecting time-tested approaches and ignoring the hype.
So, how should read research, knowing that much of it won’t stand to scrutiny?
- Do not dismiss older research merely because it is older. Do the opposite: focus your energies on older work still in use.
- Instead of picking up papers one by one, try to find the underlying themes. In effect, dismiss each individual paper, and instead focus on the recurring themes and effects. If an idea only appears in one paper, it probably can be discarded. If it is appears again and again and proves useful, it might be worth knowing.
Regarding Neural Network: At my last job, we tried to find the best representation of algorithms to test their perf. There was several things that I got opinions about:
Recomendation systems are too secret at the moment. You can’t find a real open source recomendation system that is big enough for you to care about. I think that’s the reason the 6/9 Neural Networks were better than regular approaches. It’s still possible that the real hidden networks are worse than the regular, but I just can’t be sure.
Quantization works only when your SNR is high enough, at least without retraining, which makes sense. On classification networks, it’s fairly easy to quantize because lowering your SNR does almost nothing to the result. On GANs, I think it’s almost impossible to quantize without seeing worse results. As NN started with classification, it was easy to jump on that train. I don’t think it’s still that much of a viable solution. I believe that bfloat16 or tensorfloat32 is a better solution
It’s fairly hard to get performance out of prunning. Just taking out 90% of the values and setting them to zero means nothing for simd mat multiply. You need to chop out a bunch of channels to make it worth while, and even then, it’s not that easy to get meaningful performance out of it.
I don’t think that LSTM is really that good. It’s old and it passed the test of time, but you really need to change all of those sigmoids to a normal activation function like relu or something suffiencently easy to compute.
The article on neural networks is interesting.
I have dabbled with ann’s since the 90’s. There was no tensorflow or caffe, write from scratch, while the research was slowly advancing, it was all (the research) written down on trees (imagine). I don’t know where to begin to explain why I think it’s all wrong.
It starts with (of course the usual guys), but for me there was in particular Bogdan Wilamovski (http://www.eng.auburn.edu/~wilambm/), he’s not pretty, but has something to say. He demonstrates that a fully connected cascade network is the most general shape of (feed-forward) ann’s (all shape networks are captured by this architecture). This implies that optimizing a network comes down to giving it enough nodes that it has enough plasticity to function (too little, won’t converge), too much over-learning and waste of calculation.
Optimizing the size of the network is easy, coz you only have to modify one variable with no knock-on effect. Dealing in bulk with these kind of networks can be implemented very efficiently using blas. I have trained such a network of 5 nodes (yes five) using GE to play snake at breakneck speed while learning how to play and growing and growing (too large size, it’s impressive what 5 cells can do).
The repo: https://github.com/degski/SimdNet . It’s called SimdNet, coz that’s how it started, it ended up being an ordinary BlasNet ;(. For output I use the new W10 unicode console functionality, it allows to write to the console at will basically, no flicker or artifacts (and all ‘ascii’, like a real snake game). The latter makes it non-portable, the core code is portable of course.
To conclude, you see the difference, I use 5 nodes and a bit of Blas on a moderate computer, and then there is ‘modern’ ann’s. One has too read the right book.
PS: Wilamovski has also published a very efficient 2nd order algorithm for ‘backprop’, notably he decomposes the jacobian in such a way that it can be calculated without first fully expanding it, which would be prohibitive.
PS: All literature is on his web-site, chapter 11,12,13 are the core of his work (in this respect, he seems very busy with the soldering iron otherwise).