How to Recast Statistics for Real World Operations

How to Recast Statistics for Real World Operations (2015)

Question (excerpted from Thinking, Fast and Slow by Daniel Kahneman): Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was concerned with issues of discrimination ad social justice, and also participated in antinuclear demonstrations.

Which of the following scenarios is more likely:

· Linda is a bank teller

· Linda is a feminist bank teller

According to Kahneman, over 85% of the survey takers answering this question decided that Linda is more likely a feminist bank teller. This finding holds true even for students and experts in decision sciences, statistics, and mathematics. Logically, this answer violates basic principles of math and statistics: the pool of bank tellers includes the pool of feminist bank tellers. It should be impossible for Linda to be a feminist bank teller without also being a bank teller. This is presented as evidence that our quick judging Fast Brain is incorrectly overriding our more analytical Slow Brain that ought to know better. Note that Daniel Kahneman’s expertise is in highlighting how our human biases can cause us to deviate from logical correctness in neuro-economics.

But let us turn the tables and deconstruct exactly what the finding and its analysis really mean and present some models of human behavior. This is in contrast to identifying the deviation from a particular definition of logic and attributing it to a lazy brain to guard against. By presenting the finding as inherently illogical, the survey and its answer analysis requires this concept ontology and mathematical Venn diagram:

Bank Employee is the parent of Bank Teller is the parent of Feminist, which is a leaf node. Or in other words, Feminist Bank Tellers are a proper subset of Bank Tellers. Logically, this is true. So why the discrepancy? Are 85% of us illogical?

Or is the test and answer intended to explore human bias actually biased itself? For starters, by placing Bank Tellers as the parent/proper superset of Feminist Bank Tellers, the test design is implying that the occupation should be a more defining characteristic than the personal activities and interests. A side discussion of what it truly means to say in that, “I am a Bank Teller,” vs saying, ”I work as a Bank Teller,” would be in order. Why is it that being a Bank Teller is more important in the analysis than being a Feminist? Perhaps the following diagram should be in order:

Linda is a Feminist first and a Bank Teller second. Her defining characteristic is her Feminist work. Her job as a Bank Teller is something else she does during the day and is secondary to her identity.

Let us twist the representation further. For practice, we represent the physics state space representation of the Bank Teller/Feminist concept in multiple dimensions, depending on our perspective.

The original “True State Feminist/Bank Teller” cylinder is represented from the horizontal as a circle. The same “True State” cylinder is represented from the vertical as a rectangle. Both are correct, yet both are very different. Think of these representations as Venn diagram clusters in 2D state space.

The following presents a further twist on the cylinder “True State” concept.

Now the original “True State” is a deformed, bent cylinder. From the vertical, it is still a rectangle. From the horizontal, it has two circles as if the horizontal plane sliced through the bent cylinder in two places. This is by the way, the same concept as a Support Vector Machine “Kernel Trick” or a Deep Learning Neural Network “Transform Equation.” Different perspectives get projected onto different planes in different dimensions.

Keeping in mind that the set of Bank Tellers can now possibly be represented as two sets rather than one, the following Venn Diagram becomes possible and illuminating:

Being primarily a Bank Teller may in some cases be different than being a primarily a Feminist who happens to do some Bank Telling work. When Feminist as a characteristic is equal to or greater in level than being a Bank Teller, a Feminist bank teller is no longer the same as a Bank Teller (non)-feminist.

Restating the initial question and answers but color-coded to show the salient natural language triggers and categories illustrates this assertion:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

Which of the following scenarios is more likely:

· Linda is a bank teller

· Linda is a feminist bank teller

The sensory mechanism is that our eyes implement bottom-up selective attention from Linda’s description due to the words, “outspoken, very bright, concerned, and social justice,” to name a few. These words and their combination thereof strongly hint of women’s rights and equality, and very little else. This sends a top-down selective attention match to the womens’ rights/feminist primary output category. Linda is predicted to be primarily a feminist first, with other daily work and hobbies as secondary characteristics. Readers would freely describe Linda as being an active member of society focused on feminist ideals. A Feminist bank teller produces the best match. The additional words of “bank” and “teller” are simply distracters. Whereas word search logic dictates that “Feminist AND Bank AND Teller” is a more restrictive subset of “Bank AND Teller,” a top-down-bottom-up selective attention behavioral prediction dictates that “Feminist <something something>” is not the same as “Bank Teller”

As discussed in earlier papers, a (statistical) natural language processing with top-down and bottom-up selective attention mechanisms is not simply an enhancement to a logical text pattern search. It is a human behavioral system predictive model that not simply identifies a “lazy brain” bias that needs to be guarded against but a dynamic model of how humans make decisions regardless of any particular static fixed “rightness” and “accuracy” perspective of the day.