How do you teach a car that a snowman won’t walk across the road?

A smiling snowman with a carrot nose, black eyes and smile, wearing a colourful scarf and a snow-covered blue woollen hat.

Sue Cantan/Flickr

Picture yourself driving down a city street. You go around a curve, and suddenly see something in the middle of the road ahead. What should you do?

Of course, the answer depends on what that ‘something’ is. A torn paper bag, a lost shoe, or a tumbleweed? You can drive right over it without a second thought, but you’ll definitely swerve around a pile of broken glass. You’ll probably stop for a dog standing in the road but move straight into a flock of pigeons, knowing that the birds will fly out of the way. You might plough right through a pile of snow, but veer around a carefully constructed snowman. In short, you’ll quickly determine the actions that best fit the situation – what humans call having ‘common sense’.

Human drivers aren’t the only ones who need common sense; its lack in artificial intelligence (AI) systems will likely be the major obstacle to the wide deployment of fully autonomous cars. Even the best of today’s self-driving cars are challenged by the object-in-the-road problem. Perceiving ‘obstacles’ that no human would ever stop for, these vehicles are liable to slam on the brakes unexpectedly, catching other motorists off-guard. Rear-ending by human drivers is the most common accident involving self-driving cars.

The challenges for autonomous vehicles probably won’t be solved by giving cars more training data or explicit rules for what to do in unusual situations. To be trustworthy, these cars need common sense: broad knowledge about the world and an ability to adapt that knowledge in novel circumstances. While today’s AI systems have made impressive strides in domains ranging from image recognition to language processing, their lack of a robust foundation of common sense makes them susceptible to unpredictable and unhumanlike errors.

Common sense is multifaceted, but one essential aspect is the mostly tacit ‘core knowledge’ that humans share – knowledge we are born with or learn by living in the world. That includes vast knowledge about the properties of objects, animals, other people and society in general, and the ability to flexibly apply this knowledge in new situations. You can predict, for example, that while a pile of glass on the road won’t fly away as you approach, a flock of birds likely will. If you see a ball bounce in front of your car, for example, you know that it might be followed by a child or a dog running to retrieve it. From this perspective, the term ‘common sense’ seems to capture exactly what current AI cannot do: use general knowledge about the world to act outside prior training or pre-programmed rules.

Today’s most successful AI systems use deep neural networks. These are algorithms trained to spot patterns, based on statistics gleaned from extensive collections of human-labelled examples. This process is very different from how humans learn. We seem to come into the world equipped with innate knowledge of certain basic concepts that help to bootstrap our way to understanding – including the notions of discrete objects and events, the three-dimensional nature of space, and the very idea of causality itself. Humans also seem to be born with nascent concepts of sociality: babies can recognise simple facial expressions, they have inklings about language and its role in communication, and rudimentary strategies to entice adults into communication. Such knowledge is so elemental and immediate that we aren’t even conscious we have it, or that it forms the basis for all future learning. A big lesson from decades of AI research is how hard it is to teach such concepts to machines.

On top of their innate knowledge, children also exhibit innate drives to actively explore the world, figure out the causes and effects of events, make predictions, and enlist adults to teach them what they want to know. The formation of concepts is tightly linked to children developing motor skills and awareness of their own bodies – for example, it appears that babies start to reason about why other people reach for objects at the same time that they can do such reaching for themselves. While today’s state-of-the-art machine-learning systems start out as blank slates, and function as passive, bodiless learners of statistical patterns; by contrast, common sense in babies grows via innate knowledge combined with learning that’s embodied, social, active and geared towards creating and testing theories of the world.

The history of implanting common sense in AI systems has largely focused on cataloguing human knowledge: manually programming, crowdsourcing, or web-mining commonsense ‘assertions’ or computational representations of stereotyped situations. But all such attempts face a major, possibly fatal obstacle: much (perhaps most) of our core intuitive knowledge is unwritten, unspoken, and not even in our conscious awareness.

The US Defense Advanced Research Projects Agency (DARPA), a major funder of AI research, recently launched a four-year programme on ‘Foundations of Human Common Sense’ that takes a different approach. It challenges researchers to create an AI system that learns from ‘experience’ in order to attain the cognitive abilities of an 18-month-old baby. It might seem strange that matching a baby is considered a grand challenge for AI, but this reflects the gulf between AI’s success in specific, narrow domains and more general, robust intelligence.

Core knowledge in infants develops along a predictable timescale, according to developmental psychologists. For example, around the age of two to five months, babies exhibit knowledge of ‘object permanence’: if an object is blocked by another object, the first object still exists, even though the baby can’t see it. At this time babies also exhibit awareness that when objects collide, they don’t pass through one another, but their motion changes; they also know that ‘agents’ – entities with intentions, such as humans or animals – can change objects’ motion. Between nine and 15 months, infants come to have a basic ‘theory of mind’: they understand what another person can or cannot see and, by 18 months, can recognise when another person displays the need for help.

Since babies under 18 months can’t tell us what they’re thinking, some cognitive milestones have to be inferred indirectly. This usually involves experiments that test ‘violation of expectation’. Here, a baby watches one of two staged scenarios, only one of which conforms to commonsense expectations. The theory is that a baby will look for a longer time at the scenario that violates her expectations, and indeed, babies tested in this way look longer when the scenario does not make sense.

In DARPA’s Foundations of Human Common Sense challenge, each team of researchers is charged with developing a computer program – a simulated ‘commonsense agent’ – that learns from videos or virtual reality. DARPA’s plan is to evaluate these agents by performing experiments similar to those that have been carried out on infants and measuring the agents’ ‘violation of expectation signals’.

This won’t be the first time that AI systems are evaluated on tests designed to gauge human intelligence. In 2015, one group showed that an AI system could match a four-year-old’s performance on an IQ test, resulting in the BBC reporting that ‘AI had IQ of four-year-old child’. More recently, researchers at Stanford University created a ‘reading’ test that became the basis for the New York Post reporting that ‘AI systems are beating humans in reading comprehension’. These claims are misleading, however. Unlike humans who do well on the same test, each of these AI systems was specifically trained in a narrow domain and didn’t possess any of the general abilities the test was designed to measure. As the computer scientist Ernest Davis at New York University warned: ‘The public can easily jump to the conclusion that, since an AI program can pass a test, it has the intelligence of a human that passes the same test.’

I think it’s possible – even likely – that something similar will happen with DARPA’s initiative. It could produce an AI program specifically trained to pass DARPA’s tests for cognitive milestones, yet possess none of the general intelligence that gives rise to these milestones in humans. I suspect there’s no shortcut to actual common sense, whether one uses an encyclopaedia, training videos or virtual environments. To develop an understanding of the world, an agent needs the right kind of innate knowledge, the right kind of learning architecture, and the opportunity to actively grow up in the world. They should experience not just physical reality, but also all of the social and emotional aspects of human intelligence that can’t really be separated from our ‘cognitive’ capabilities.

While we’ve made remarkable progress, the machine intelligence of our current age remains narrow and unreliable. To create more general and trustworthy AI, we might need to take a radical step backward: to design our machines to learn more like babies, instead of training them specifically for success against particular benchmarks. After all, parents don’t directly train their kids to exhibit ‘violation of expectation’ signals; how infants behave in psychology experiments is simply a side effect of their general intelligence. If we can figure out how to get our machines to learn like children, perhaps after some years of curiosity-driven, physical and social learning, these young ‘commonsense agents’ will finally become teenagers – ones who are sufficiently sensible to be entrusted with the car keys.

Published in association with the Santa Fe Institute, an Aeon Strategic Partner.

Computing and artificial intelligence Complexity Future of technology

31 May 2019

Post