Recognizing Computational Bias in Image Recognition (2014)

 

If one takes the view that eyeballs operate like video recorders, then the world is truly a buzzing blooming confusion. The familiar person one day has changed her outfit the next day, has changed her posture, lighting and viewing angle the next minute.  Matching different video frames of the scene to recognize and make sense in this world would be nigh impossible, regardless of the specific matching algorithm. Too many pixels in the subsequent images have changed. Yet people do this effortlessly and seamlessly. Recent research highlights potential mechanisms how.

 

Recent research and its analysis indicate the brain may use classic computer science approaches in detecting the familiar within the novel. (Kriete, et al., 2013). This is not surprising. Computer science is at its heart the study of artificial intelligence. It is modeled at least loosely on the outputs and mechanisms of the brain. Finding a computer science concept that could serve as a brain framework is recursive: from the brain and back to the brain. The specific concept here is of indirection and pointers.

 

A computer program is an instruction set, like a DNA pattern or a recipe. For it to do anything useful, it requires resources, just as a recipe needs ingredients or DNA needs proteins. A program's resources are its data. All data in a computer system resides on a physical memory location either in registries, RAM, harddrive, or removable media. These physical locations have addresses. Referring to this data explicitly by physical address is pointing to it via pointers and indirection. While all computer data must use physical locations, explicitly referencing it in this manner provides more flexibility. The data becomes like a database file that can be moved, substituted, or even shared.

 

Using this as a framework, a scene frame is not a static, inert image for direct pixel-to-pixel matching but an outline with placeholders ready to be filled, swapped, and chopped. Think of a MadLibs fill-in-the-blanks sentence construction. The researchers believe this is akin to the brain mechanism of the frontal cortex and basal ganglia connections behind human ability to make sense of novelty.

 

However, the framework similarity breaks down from there. For one, "the brain has to be trained... to understand sentences while a computer can be programmed to understand sentences immediately." (Sciencedaily.com) For another, indirection from the brain to computer science to programs and pointers and back to the brain presents so many indirections as to potentially be misdirection. This framework may be descriptive of how the brain makes sense of novelty and thus communicate the phenomenon, but perhaps another perspective may be needed to provide a causal model. For instance, much biological and neuroscience work exists pointing towards overt and covert eyeball saccades, visual attention, and selective attention filters. While both the computer science pointer and the neuroscience of visual attention imply that the novel scene can be broken up into segments, the computer program pointers perspective implies a top-down fixed scaffolding to be filled in (i.e. Universal grammar in a MadLibs like structure) while the visual saccade implies a bottom up, highly focused (i.e. Filtered) and flexible scene construction. The saccade or attentional focus highlights and anchors to the familiar, then scans the novel periphery rather than forms the Mad-Libs scaffold periphery and substitutes in the internal pointers. 

 

These may appear to be subtle differences in analogy, but they have profoundly different implications in subsequent approaches.  A top-down scaffold-then-pointer approach implies there is a need for an internalized sense of order that first needs to be learned or programmed in.  While describing the brain and behavioral mechanism thusly can be helpful getting started and to address an audience speaking computer science – since it is the closest analogous match within the discipline – this approach will reach its limits sooner than might describing the phenomenon with a neuroscience approach. 

 

One must take care in how, when, and where to describe the mechanisms of brain and behavior to either understand or exploit the knowledge.  Like a plane going for a landing on safe and familiar asphalt surrounded by novel terrain, many approach vectors will place the plane on the ground, but perhaps only a few angles would do so meaningfully with happy passengers and cargo.