The Art and Science of Building an Artificial Intelligence

Take a dip into the oceanic quantity of content on artificial intelligence and you'll soon spot a recurring theme: AI reporting tends to fall into two categories. In one, the robots will kill us. In the other, they'll replace us. In the latter case, the (human) author predicts our irrelevance being caused by machines more intelligent than we are, citing their existing ability to calculate more data more quickly, recognize objects in photos, recommend content for us to read, show us things to buy, supposedly pass Turing tests and automate most of the trading functions that happen on the global stock markets. Some AI, like Narrative Inc’s Quill, can even write analysts’ and legal reports for clients like T. Rowe Price. The authors of these sometimes nuance-free, Terminator-toting pieces generally stop short of mentioning when the AI will be writing articles about itself, lest they write themselves out of a job.

However, these legacy approaches to AI aren’t where the breakthroughs are needed or where they’ll happen. Now, then, is the time for us thinkers, hackers, makers, creatives and explorers to invent the brave new world: one in which the definition of intelligence is reimagined for clarity and precision, in which we can bake relatable ethics and empathy into the machines we are empowering, and into which we can conceive an AI that will not kill us nor destroy our markets. To do so, we don’t have to start with a technical degree. However, we do have to have imagination, creativity, guts and knowledge of the liberal arts, language. and human behavior.

This will become clear once we’ve made the connection from Plato to Google’s “Father of Deep Learning”.

Leading Thinkers on Thinking

On the surface, AI seems rather far along, but dig just a little deeper and the machines aren’t as clever as we've been giving them credit for. In many cases, elementary school students continue to beat machines at learning and language understanding. So how do we graduate artificial intelligence to middle school and above? Let’s explore some of their key limitations and strategies for turning AI into our next best tools.

No article on deep learning would be complete without mentioning the views of Geoff Hinton, Google’s “Father of Deep Learning” -- but there are Three Wise Men whose words we should be aware of, even before Hinton, on our journey to understand intelligence. 

Over two thousand years ago, the philosopher Plato wrote:

All learning has an emotional base.

In the fifteenth century, Leonardo Da Vinci drew ‘Vitruvian Man’ and wrote:

All our knowledge has its origins in our perceptions. Study the science of art. Study the art of science. Develop your senses - especially learn how to see. Realize that everything connects to everything else.

In the twentieth century, one of the fathers of modern computing John Von Neumann said:

When we talk mathematics, we may be discussing a secondary language built on the primary language of the nervous system.

This last quote is especially interesting because, more recently, Daniel Kahneman, who won the 2002 Nobel Prize in Economics and published ‘Thinking, Slow and Fast’ in 2011, explained: 

System 1 is fast, intuitive and emotional. System 2 is slower, more deliberative, and more logical. It is to do with orderly computations, rules and reasoning.

Although System 2 believes itself to be where the action is, the automatic System 1 is the hero of the book. I describe System 1 as effortlessly originating impressions and feelings that are the main sources of the explicit beliefs and deliberate choices of System 2. The automatic operations of System 1 generate surprisingly complex patterns of ideas, but only slower System 2 can construct thoughts in an orderly series of steps. 

That provides some food for thought about how Plato, Da Vinci, John Von Neumann and Daniel Kahneman would define and model human intelligence and then replicate it into the machines to reflect, proxy and support our decision-making.

How's Artificial Intelligence Being Worked On?

With investor money pouring into AI, it has become tempting for founders and developers to get together with some data scientists, PhDs in AI, and machine vision experts to create one of two types of Deep Learning system, which is at the core of a machine's ability to replicate behaviors we may classify as intelligent:

Understanding Vision

The first of two primary deep learning research areas is perception. This team may scans people’s photos or videos and measures a number of facial features on those photos or videos. This is done with point-position measurements of the eyes, eyebrows, noses, mouths, shadow depths and skin tones followed by the machine building up those feature points on a sequential, layer-by-layer basis. For example, layer one shows a person with their eyes and mouth in a neutral position. Layer two shows a person with upturned corners in their mouths and bigger eyes. Over time, more photos are added to train the machine to learn that some angle of upturn in the mouth, some radius of pupil expansion in the eyes and some wideness in the nostrils of the nose maps to a person either genuinely smiling, grimacing or fake smiling. This is useful to retailers who may want to gauge customers' emotions towards products, captured on CCTV or in selfies taken next to products.

A similar methodology of measuring distance, shape, shadow depths and angles is used to detect objects like cars, cakes and cats. This type of work falls under building an AI's ability to perceive its environment and usefully collect stimuli similar to human vision. A layman's analogy is challenging: think of learning to see. Regardless of the clarity of your vision, at first everything is just meaningless visual noise, full of moving colors and shapes. Over time, you notice your mom's face is comprised of the same features as the faces of many others - noses, eyes, lips, and so on. You build a pattern of what food usually looks like, and maybe you even associate green with yucky vegetables for a time. As you spend more time looking at things, your visual comprehension becomes very sophisticated, and you become able to drive safely, perceive object weights, and recognize comfortable places to sit. The ultimate ambition of AI is to meet or exceed this level of comprehension.

Understanding Text

The second main application for deep learning lies in understanding language. At its core, this big data problem uses Natural Language Processing (NLP) or text mining to pull in a lot of historical, unstructured text data and then applies successive classification layers to make sense of associations, thereby building a network of knowledge behind word meanings -- ultimately "learning" from the content. Researchers rely on pioneering work of Google’s Word2Vec or IBM Watson’s NLP to structure and extract the semantic meanings of words, then cluster-associate words and sentences into vector spaces.

Lost? Consider this example: the words "Apple, iPhone, cool, Android, apps" may belong to one cluster while another cluster contains "apple, Gala, crunchy, pips, autumn". A sentence such as "The cat sat on the mat" would be correlated closely with a sentence like "On the mat sat the cat" in the vector space, therefore literally being placed more closely together in the vector space model. This association is decided by an algorithm that considers words, synonyms, antonyms, and their proximity and order against one another. Really, it's just like teaching a 3-year-old to read -- except the child is actually a building-sized array of supercomputers.

This area of deep learning is of interest to companies that want to understand what customers are writing and saying about them online; the text mining market is worth $2 billion annually whilst the big data market is worth around $125 billion.

Examining the Language Problem

These tools have been my obsessions for over two decades, since first learning chess. From the moment I started beating computer chess game, including one architected by Apple's Alan Kay, I dove into learning about Turing's Enigma Machine, the mapping strategies of machine learning, and how to actually get involved. I remember 1987 well: it was the year the AI Deep Blue beat Garry Kasparov in chess. 2011 was at least equally exciting, with IBM Watson's Jeopardy victory hitting the headlines. Then, in 2014, the chatbot Eugene Goostman supposedly beat the Turing Test -- a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human originally proposed by Alan Turing. By then, I knew the processing cogs underneath well.

Of the two applications of deep learning, NLP and text mining are much harder problems to solve because, unlike our facial features, the features of words don’t quite fit perfectly into points in vector space. At the moment, NLP works as if by trying to measure words with rulers, while we really should be using a tape measure that goes around curves or even scales. This presents us with big opportunities to invent new analytics tools for companies.

The machines could be called intelligent if we classify intelligence as simply some black box of functions (i.e., logic, probability and applied mathematics), and define mathematics as a language that maps perfectly like-for-like to natural language. However, we all know intelligence is about much more than mathematics and that even the word “one” has multiple dynamic meanings which can’t be perfectly mapped. 

Here are some Merriam-Webster examples for distinct uses of “one”:

1. One day at a time...

2. Early one morning...

3. One fine person...

4. Both of one species...

5. I am one with you on this.

6. The one person she wanted to marry.

Now how do you teach a machine to make these distinctions in every single sentence ever written?

When we use the word “one” or any word, the context, timing, emotional dynamic of the situation, and subjective and objective biases between the people communicating the words are all at play - simultaneously. This isn't as straightforward to model as we’d assume because mathematics as a language isn’t great at gauging emotions, whereas natural language just does it.

Moreover, recalling Plato’s wisdom, “All learning has an emotional base,” we should ask if the machines are deep learning when they have no emotional base.

When AI researchers try to convert emotions into something that’s readable by the machines, they assign either a 1 or a 0 to the emotion (1 = positive emotion; 0 = negative emotion) or a probability to the emotion ( 0.0 = unhappy …. 0.5 = neither happy nor unhappy …. 1.0 = happy).

With that simple example it’s clear that, at the very base of machine intelligence, deep learning and mathematics, there’s a genuine need for us to tap into the liberal arts and linguistics to reimagine the solution. Otherwise, we'll end up going down the same wrong path as many layman AI writers in believing machines without nuance are actually learning more than they are. We think and consider things emotionally but the machines don't and can't do this well yet.

So How Close to Real AI Are We?

Interestingly, Geoff Hinton of Google, shared this in October 2014:

If the computers could understand what we’re saying...We need a far more sophisticated language understanding model that understands what the sentence means. And we’re still a very long way from having that.

Meanwhile, Alison Gopnik of USC Berkeley, noted:

When we started out (in AI) we thought that things like chess or mathematics or logic, those were going to be the things that were really hard...Not that hard! I mean, we can end up with a machine that can actually do chess as well as a Grandmaster can play chess.

The things that we thought were going to be easy - like understanding language - those things have turned out to be incredibly hard. Those are the great revolutions (understanding language) - not just when we fiddle with what we already know but when we discover something new and completely unexpected.

The quick answer: not as close as we thought we'd be.

That’s why mathematicians, computer and data scientists and other technical folk need domain experts from the liberal arts, languages and behavioral psychology to help us make breakthroughs in machine intelligence. The base of legacy AI isn’t emotional and mathematics as a language doesn’t and can’t yet deal with or map emotions in the same way that art, language and human intelligence does.

So if you’re a thinker, hacker, maker, creative or explorer, a blank canvas awaits you to redraw, remap and redo the whole of machine intelligence and make it deep and able to learn like us. The data analytics part of the market is worth over $125 billion and the human value part's much much bigger than that.