How to Design a Simple Neurological Neural Network (2014)

 

Where have all the simple non-linear neural networks gone?

 

In 1987, Dr. Bart Kosko designed and published the Bidirectional Associative Memory (BAM) network.   Referenced in Jeff Hawkins' book, On Intelligence, it is one of the most biologically plausible, intricately non-linear, and on-line adaptive associative memory models.  It can instantly store hi-fidelity patterns and associate them to another pattern (e.g. This musical score belongs to Mozart; this musical score belongs to Bach).  It can even store the relationships forwards and backwards (e.g. Mozart's music sounds like this musical score).  It is highly noise and missing data tolerant (e.g. this musical fragment sounds like something from Mozart;  "Mzrt" could be Mozart and his music would be thus).  And best of all, the BAM network can run on a simple PC.  From 1987. 

 

Yet there are relatively few academic references extending such an interesting and potentially useful model over the past two decades.  And precious little uptake in professional use as far as can be determined.  For instance, Backpropagation Networks, Adaptive Resonance Theory Models, Radial Basis Functions, Support Vector Machines, Hidden Markov Models, and other contemporaries have far more business and professional use and experimentation.  The vast majority of these alternative models require far more computing power, yet are limited in their ability to run forward/backward, require painstaking manual training, and are unstable with respect to training presentation and paradigm.   The complex and computationally expensive models are embedded in many academic and professional circles.  But where are the simple ones?

 

To be fair, BAM is a heteroassociative model. Its forte is not specialized for pure classification accuracy as is the case for maximum margin support vector machines or highly parameterizable Backpropagation models.  A BAM technically has no parameters or tweakable internal settings. Its strength and focus is on linking or associating multiple disparate patterns together. It has more in common with Kohonen or Hopfield Networks than with the conventional prediction models commonly included in computer science machine learning or neuroscience modeling texts. The below figures briefly demonstrate the BAM operations.

 

 

For simplicity, we wish to encode a 3x3=9 boolean pixel image of the Chinese number yi (one).  It forms a horizontal bar on the lower 3 pixels like so:

 

 

We can represent these 9 pixels in a (blue) bi-directional single vector, where –1 represents an empty pixel and +1 represents a filled pixel.  We wish to associate or translate this Chinese number yi to a (green) 3 feature vector one [-1,-1,1].  

 

 

Vector multiply the 9-pixel blue and 3-feature green vectors to arrive at a 9x3 teal matrix.

 

 

This teal matrix represents the association between the Chinese yi and the digitized 1.

 

We also wish to translate Chinese er (two) appearing as two horizontal bars into a digitized two as repeated below.

 

 

Add the two teal matrices and arrive at the core BAM “brain.”  This stores four patterns and two associations in a single 9x3 matrix.

 

 

 

To extract any of the four patterns, (1) multiply, say, the column vector side of an association on the BAM “brain” matrix, (2) sum the 9 columns to arrive at a single 9 pixel vector, (3) threshold each pixel such that values greater than 0 become filled and values less than or equal to 0 become empty, and (4) arrange the pixels into a 3x3 image.  In this case, entering the digitized two [-1,1,-1] into the BAM “brain” causes it to reveal the Chinese number er.

 

 

Some simple experimentation in Excel shows the BAM “brain” to be noise tolerant.  Computational complexity is O(c); that is, it runs in constant time regardless of the amount of patterns and association data.  This is a hugely desirable trait.  As a biological model, it is consistent with a 70-year old not requiring 10x as much time to respond to, “What is your name?” as a 7-year old. 

 

A BAM is an associative model.  By setting up the association patterns, it can produce predictions like any machine learning or predictive model.  In this manner, associative models are supersets of predictive or classification models.  So why are these not used or explored as much as their cousin classification models? 

 

Potential explanations why include: (1) BAM models run in constant time, but have a fuzzy limit on the number of patterns and associations safely stored in a fixed BAM “brain.”  This limit is generally equal to the lesser of the two vector sizes.  However, this limit can be exceeded before this point if the patterns are sufficiently similar to produce crosstalk.  (2) The output when it exceeds this fuzzy limit is unpredictable.  It may produce novel output patterns never before stored.  If it is set as a 2-class classifier, a BAM might conceivably produce a third option with no warning. 

 

On the other hand, mixing up similar patterns and associations is consistent with real people mixing up similar facts in real life.  For example, a person may view a face and remark that they have “their mother’s eyes” but “their father’s chin.”  Going further, this mixing and production of novel patterns might offer potential avenues for exploring creativity – how does an artist create various art forms such as classical symphonies or impressionist art?  These are not constrained 2-alternative forced choice classification tasks. 

 

In any case, BAM models and derivatives are intriguing, yet apparently hidden gems of models well deserving of further exploration.