Hardware Considerations in Image Recognition

Hardware Considerations in Image Recognition (2012)

With more computing power and better means of harnessing multiple processors, there have been several research projects exploring neural network simulations of the human brain. Some of the publicly known ones include the Blue Brain Project, headed by Henry Markram of Ecole Polytechnique Fédérale de Lausanne, and the SyNAPSE Project, sponsored by the Defense Advanced Research Projects Agency and including a consortium of universities – including my alma mater. In recent news, GoogleX and Stanford University also announced that they have been working on large-scale simulations of the human brain. Their initial goal: to better recognize and automatically classify Youtube videos. To get a high-level idea of what these research teams are confronting in simulating the human brain, this article follows the recent GoogleX/Stanford team.

The human brain is necessary for human behavior – including recognizing what is and is not a cat image. To do this, the human brain contains approximately 100 billion neurons and up to 500 trillion synapses between these neurons. To put these numbers in perspective, many parents pay up to $1000 for their children’s SAT preparation courses. If that course helps the college-bound hopeful grow 100,000 synapses to supplement their knowledge, it works out to about a penny per synapse. At that rate ($0.01 * 500 trillion), the cost to develop a single brain to adulthood is about 5 trillion dollars, or 1/3 of the US GDP. Clearly, simulating the human brain can be potentially big business.

The neural network on which the GoogleX/Stanford team reported contained a 9-layer model with 1 billion synapses (Le, et al, 2012). To run this model – admittedly many, many times smaller in scale than the human brain – required 16,000 computing cores. A computing core is essentially a computer from 5-10 years ago. With a modern day laptop containing about 2-4 cores it would take about 4,000 to 8,000 laptops to replicate this level of computing power. It would also take an enormous amount of tricky algorithm design to break up the single model with 1 billion moving parts over 16,000 different computer cores.

The GoogleX/Stanford scientists let this machine loose on 10 million random video thumbnails from the Youtube database. Training took three days. The training was unsupervised, meaning that there was no explicit assistance from the scientists in terms of labeling features (“cats have pointy ears at the top of their horizontally oval face shape”) or pre-labeling examples (“this is a cat”). Using these 16,000 computing cores, the research tests showed 70% improvement in accuracy compared to the prior state of the art. The Google scientists stated that the research is migrating to the business division for image search, targeted advertisements, recommended next videos, and the like. This means they may want to scale up more processors, validate the results on business data, fine tune the business model, and build operational knowledge, among other various complicated practical tasks that make up the body of the iceberg if the research is the tip. To answer the titular question, “How much brain power does it take to recognize a cat?” The tentative answer is 16,000 cores’ worth.

Now for the caveats. “’It’d be fantastic if it turns out that all we need to do is take current algorithms and run them bigger, but my gut feeling is that we still don’t quite have the right algorithm yet,’ said Dr. Ng.” That is number one. There are many neural network models and families of neural network models, each with different characteristics and behaviors. Most, if not all, of the core models neglect to tie their mechanisms to the underlying brain function except on a superficial level. What type of biological memory does the model simulate – short term, long term, episodic, implicit, working or executive, or other? Most, if not all, of the model derivatives and scaling up continue this tradition. While results can be impressive compared to the state-of-the-art, this is not as enlightening in simulating the human brain as expected since the state-of-the-art more often than not is unclear on what exactly it simulates. More on this in another article post.

What about the data? It was a large, random sample of video thumbnails. But the data in Youtube may suffer from selection biases. These are user-uploaded videos that were selected and posted for a reason. On top of that, the thumbnails are also selected to convey a message in a 200 by 200 pixel snapshot. The data is actually supervised and biased in a form, just not by the scientists. That is number two. This is not a technical/engineering problem since the unsupervised learning claim is for work reduction in business processing, which means lower overhead and operational costs. This is good for practical business use, but it may create a contention point in cognitive development research.

The third caveat is the most serious one that resides in the doubts of every computational neuroscientist. Neural networks and all its promises were discovered in the 1950’s. All the science fiction books and movies of the time had super-computing androids walking and flying around by now. Instead, we have automated phone agents most of us try to bypass, driverless cars that upon reflection are fancy cruise controls, and the Roomba. Why is there such a large gap? There are countless charts showing available computing resources being the equivalent of an insect brain, with future growth matching a rat brain, a cat brain, a monkey brain, a chimpanzee brain, and finally a human brain in decades. This implies that all we need is for the engineers to develop faster computers in accordance with Moore’s Law. But what about King Mu’s artificier, Solomon’s animals, da Vinci’s robots, Vaucanson’s duck, and others? Surely they also thought that real, life-like intelligence was just around the corner if not here already – hundreds and thousands of years ago.

How much processing power and storage capacity does the human brain really require? How many GigaHertz does it go? How many TerraBytes does it have? That makes the human brain target number. Divide that number by a modern laptop with an Intel quad-core i7 with 3 GigaHertz clock speed and a 1 TerraByte hard drive and we get our number. Maybe it is 16,000 cores. Maybe it is 16,000,000 cores. It depends on the human brain target number assumptions. And there are a lot of assumptions.

If those assumptions are wrong, then the human brain target number is completely undefined. What if the human brain target number is the wrong question? What if, for all we know about visual processing, the lateral geniculate nucleus, and the inferior temporal cortex, we are looking in the wrong place for understanding how we recognize a cat? What if, for example, a cat is not intrinsically a cat but some externally derived, artificial concept supervised and imprinted on us by our parents such that we identify that which we feel has value to them? Then the human intelligence behind cat image detection is not measured in GigaHertz or TerraBytes. No amount of GigaHertz or TerraBytes will make a human brain. No amount of cores will make a human brain.

Then our consolation is that we may be able to salvage something of our work and research careers and take our places among the creators of King Mu’s artificer, Solomon’s animals, da Vinci’s robots, and Vaucanson’s duck. But we would need to wait for the next generation to try again for real, lifelike intelligence to recognize a cat like a human or a chimpanzee or a cat would, but not like a Word document or a Google search does.