Recognizing Computational Bias in Image Recognition (2014) | ||
If
one takes the view that eyeballs operate like video recorders, then the
world is truly a buzzing blooming confusion. The familiar person one day
has changed her outfit the next day, has changed her posture, lighting and
viewing angle the next minute.
Matching different video frames of the scene to recognize and make
sense in this world would be nigh impossible, regardless of the specific
matching algorithm. Too many pixels in the subsequent images have changed.
Yet people do this effortlessly and seamlessly. Recent research highlights
potential mechanisms how. Recent
research and its analysis indicate the brain may
use classic computer science approaches in detecting the familiar
within the novel. (Kriete,
et al., 2013). This is not surprising. Computer science is at its
heart the study of artificial intelligence. It is modeled at least loosely
on the outputs and mechanisms of the brain. Finding a computer science
concept that could serve as a brain framework is recursive: from the brain
and back to the brain. The specific concept here is of indirection and
pointers. A
computer program is an instruction set, like a DNA pattern or a recipe.
For it to do anything useful, it requires resources, just as a recipe
needs ingredients or DNA needs proteins. A program's resources are its
data. All data in a computer system resides on a physical memory location
either in registries, RAM, harddrive, or removable media. These physical
locations have addresses. Referring to this data explicitly by physical
address is pointing to it via pointers and indirection. While all computer
data must use physical locations, explicitly referencing it in this manner
provides more flexibility. The data becomes like a database file that can
be moved, substituted, or even shared. Using
this as a framework, a scene frame is not a static, inert image for direct
pixel-to-pixel matching but an outline with placeholders ready to be
filled, swapped, and chopped. Think of a MadLibs fill-in-the-blanks
sentence construction. The researchers believe this is akin to the brain
mechanism of the frontal cortex and basal ganglia connections behind human
ability to make sense of novelty. However,
the framework similarity breaks down from there. For one, "the brain
has to be trained... to understand sentences while a computer can be
programmed to understand sentences immediately." (Sciencedaily.com)
For another, indirection from the brain to computer science to programs
and pointers and back to the brain presents so many indirections as to
potentially be misdirection. This framework may be descriptive of how the
brain makes sense of novelty and thus communicate the phenomenon, but
perhaps another perspective may be needed to provide a causal model. For
instance, much biological and neuroscience work exists pointing towards
overt and covert eyeball saccades, visual attention, and selective
attention filters. While both the computer science pointer and the
neuroscience of visual attention imply that the novel scene can be broken
up into segments, the computer program pointers perspective implies a
top-down fixed scaffolding to be filled in (i.e. Universal grammar in a
MadLibs like structure) while the visual saccade implies a bottom up,
highly focused (i.e. Filtered) and flexible scene construction. The
saccade or attentional focus highlights and anchors to the familiar, then
scans the novel periphery rather than forms the Mad-Libs scaffold
periphery and substitutes in the internal pointers.
These
may appear to be subtle differences in analogy, but they have profoundly
different implications in subsequent approaches.
A top-down scaffold-then-pointer approach implies there is a need
for an internalized sense of order that first needs to be learned or
programmed in.
While describing the brain and behavioral mechanism thusly can be
helpful getting started and to address an audience speaking computer
science – since it is the closest analogous match within the discipline
– this approach will reach its limits sooner than might describing the
phenomenon with a neuroscience approach.
One
must take care in how, when, and where to describe the mechanisms of brain
and behavior to either understand or exploit the knowledge.
Like a plane going for a landing on safe and familiar asphalt
surrounded by novel terrain, many approach vectors will place the plane on
the ground, but perhaps only a few angles would do so meaningfully with
happy passengers and cargo.
|