Some say language evolved by firelight, with our ancestors sharing stories deep into the night. Others suggest it began as baby talk, or as imitations of animal calls, or as gasps of surprise. Charles Darwin proposed that language started with snippets of song; Noam Chomsky thought it was just an accident, the result of a freak genetic mutation.
Proposals about the origins of language abound. And it’s no wonder: language is a marvel, our most distinctive capacity. A few slight movements of tongue, teeth and lips, and I can give you a new idea, whisk you somewhere else or give you goosebumps. Any thought a human can think, it would seem, can be shared on a puff of air. Explaining how this all started has been called the ‘hardest problem in science’ and it’s one that few can resist. Linguists, neuroscientists, philosophers and primatologists – not to mention novelists and historians – have all taken cracks at it.
Over this long and colourful history, one idea has proven particularly resilient: the notion that language began as gesture. What we now do with tongue, teeth and lips, the proposal goes, we originally did with arms, hands and fingers. For hundreds of thousands of years, maybe longer, our prehistoric forebears commanded a gestural ‘protolanguage’. This idea is evident in some of the earliest writings about language evolution, and is now as popular as ever. Yet even as the popularity of the ‘gesture-first’ theory has surged, its major weakness – a flaw some consider fatal – has become all the more glaring.
Early proponents of ‘gesture-first’ ideas appealed to the intuition that bodily communication is primitive. In his Essay on the Origin of Human Knowledge (1746), Étienne Bonnot de Condillac imagined a boy and a girl, alone after a deluge, struggling to invent language anew. He described how the boy, wanting to obtain some out-of-reach object, ‘moved his head, his arms, and all parts of his body’, as if trying to acquire it. And the girl got the message. A scene very much like this can be readily seen today, of course: a baby in highchair wriggling in the direction of a toy just beyond her grasp. Part of the primitive aura of gesture – for Condillac and other early thinkers – stemmed from the observation that gesture precedes speech in infancy. Before children can talk, they point, nod, wave and beg. Perhaps, goes the logic, the development of language in our species followed this same sequence.
Anthropologists of the 19th century widely championed gesture-first theories, citing other intuitive arguments. Garrick Mallery – who saw gesture as a ‘vestige of the prehistoric epoch’ – noted that it is much easier to create new, interpretable signals with one’s hands than with one’s voice. Imagine ‘troglodyte man’, he wrote in 1882. ‘With the voice he could imitate distinctively but the few sounds of nature, while with gesture he could exhibit actions, motions, positions, forms, dimensions, directions, and distances, with their derivatives and analogues.’ In more modern terms, it is easier to create transparent signals with gesture – signals that have a clear relationship to what they mean. This observation has since been borne out in lab experiments, and it remains one of the most compelling arguments for a gestural protolanguage.
In the 20th century, scholars held on to these intuitive arguments for gestural theories, while also introducing new sources of evidence. One thinker in particular, Gordon Hewes, deserves special credit for this advance. An anthropologist at the University of Colorado, Hewes had an encyclopaedic cast of mind and an unusual zeal for questions about language origins. In 1975, he published an 11,000-item bibliography on the topic. But it was his article ‘Primate communication and the gestural origin of language’ (1973) that would initiate a new era of ‘gesture-first’ theorising.
Chief among the article’s contributions was the idea that we should look closely at the communicative proclivities of primates. In the preceding decades, several attempts had been made to see if apes might be able to learn human language. In one case, a couple adopted a young chimpanzee named Viki and treated her ‘as nearly as possible like a human child’. By the age of three, she had a number of human-like tendencies. She liked building block towers. She loved playing with the phone, holding the receiver up to her ear. But, as the couple noted, ‘she seldom says anything into it’. All told, Viki was reported to speak just three words – ‘papa’, ‘cup’ and, after some physical assistance in forming her mouth correctly, ‘mama’.
Vocal language, it would seem, was out of reach for apes. But gestural language proved to be another matter. Another couple carried out a similar home-rearing experiment with a different chimpanzee, Washoe, but used manual signs borrowed from American Sign Language instead of English words. Washoe’s linguistic capacities blew past Viki’s. She ultimately mastered some 350 signs – nowhere near the level of a human signer, of course, and with none of the grammatical sophistication, but impressive nonetheless. Subsequent studies teaching signs to other apes – including Koko, a gorilla, and Chantek, an orangutan – enjoyed comparable successes.
Compared with the mouth, the hands seem to be better soil for the seeds of language
Hewes also drew on the emerging understanding of primate communication in the wild. Primate vocalisations, he noted, are largely involuntary, not directed to a particular audience, and only ‘meagerly propositional’. Primate gestures, by comparison, seemed richer. Though the data available to Hewes on this point were scant, work in the time since has confirmed his hunch. Chimpanzees use a wider set of gestures than of vocalisations, and seem to use those gestures more intentionally. Bonobos, for instance, will use a ‘come here’ beckoning gesture and watch their audience closely for a reaction. If they don’t get the response they want, they’ll make it again. Chimpanzee vocalisations do appear to be under some voluntary control – they’re not just reflexive emissions – but not to the same extent. The same asymmetry is seen in the flexibility with which these types of signals are used: ape vocalisations are strongly tied to a stereotypical context, but their gestures are less so.
The takeaway from observations of primate communication is not that ape gestures have all the hallmarks of human language – far from it. It’s that, compared with the mouth, the hands seem to be better soil for the seeds of language. Central to contemporary discussions of language evolution is the notion of our last common ancestor with chimpanzees – a now-extinct species perhaps 5 to 10 million years old. Given what we know about primates today, we can confidently say that this ancestor had gestural and vocal abilities much like those of modern chimps. Which means its hands were more language-ready than its mouth – as Hewes put it in 1973, the manual medium was the ‘line of least biological resistance’.
Hewes continued advocating gesture-first theories into the 1980s, and he occasionally surfaced new sources of evidence. In one paper, he drew attention to a human peculiarity that few had made sense of: our palms and fingernail beds are lighter than the surrounding skin. (This contrast is more obvious in people with darker skin but is also evident when people with lighter skin are deeply tanned.) No other primate seems to show ‘palmar depigmentation’ of this type – a fact that Hewes checked for himself by visiting zoos. He speculated that this uniquely human feature had evolved to increase the visibility of our gestures. He seems to have been imagining a scene around an early hearth, the hands of some prehistoric storyteller flashing in the firelight.
In the decades since Hewes, the popularity of gestural theories has swelled. Leading figures in the cognitive sciences have now published their own variants in prominent venues. What might once have been called a single gesture-first theory is now best considered a family of related proposals. One branch maintains that early gestures consisted primarily of ‘pantomime’ – that is, those transparent gestures that re-enact or depict and thus resemble what they mean. Another branch posits that pointing is the most likely ur-gesture, citing the fact that it is the first to be acquired by children.
Perhaps this rising popularity is not so surprising. When we pull together the best arguments for gesture-first theories and lay them out, the picture is indeed compelling. There’s the fact that gesture emerges prior to speech in a child’s life. There’s the fact that the manual medium seems to be superior at creating transparently meaningful signals. There’s the fact that our last common ancestor with chimpanzees likely had the equipment to eventually create a sophisticated system of manual signals, but not vocal ones. And there is also a key existence proof: contemporary sign languages. With all the power and subtlety of spoken languages, sign systems show us that the voice is not the only viable vehicle for our species’ most distinctive capacity.
Now we come to the problem: today, speech predominates. People gesture, but their gesture is clearly a secondary supplement. People also sign but, outside of deaf communities, they favour speech. So, if language did get its start in the hands, then at some later stage it decamped to the mouth. The vexing question is: why? Already in the 18th century, Condillac appreciated the difficulty of this problem. ‘With the language of action at that stage being so natural, it was a great obstacle to overcome,’ he wrote. ‘How could it be abandoned for another language whose advantages could not yet be foreseen?’ This is now known as the problem of ‘modality transition’. To his credit, Hewes fully acknowledged it, and every gesture-first proponent since has had to address it in some way. Can it possibly be explained?
According to some, the short answer is no. The sign language researcher Karen Emmorey at San Diego State University argues that the very existence of sign systems – which are, again, as fully and complexly linguistic as spoken systems – undercuts the idea of a gestural origin for language. She reasons that if language had first built a home in the hands, it would have had no compelling reason to leave. Thus, it must have been spoken from the start. The Dutch psycholinguist Pim Levelt has reached a similar conclusion, calling gestural theories a ‘persistent misconception’ and ‘wasteful’. For humans to go through a gestural stage, he wrote in 2004, would be like ‘building a car by first building a ship and then removing the sail, putting it on wheels and adding a combustion engine’.
The burden of proof is clear. To remain viable, gesture-first theories need an account of the move from hand to mouth. And this account would need several pieces. I like to imagine modality transition as a kind of epic journey, one that our protagonist – language – might have been on for hundreds of thousands of years. As with any journey, the protagonist would need a reason to go and a means of getting there, a why and a how. Let’s take these two parts in turn.
First, a reason to go. Discussions of why language left the hands often take a scorecard format, a tallying of the putative advantages of speech over gesture. A first supposed advantage, it has often been said, is that speech is abstract. Spoken words for the most part bear an utterly arbitrary relationship to what they mean – the word ‘tree’ doesn’t resemble the physical form of a tree. Visual forms such as gesture, goes the argument, are not arbitrary in this way – and hinder abstract thought as a result. But the argument is specious. Hand movements can be motivated – as we’ve seen, this is one of their virtues. But they can also be as fully arbitrary as spoken words, as seen in the ‘peace’ gesture, for example, and in the countless opaque signs in sign languages.
There is one key feature of speech that is harder to dismiss: it takes very little effort
A second supposed advantage is that speech is better in the dark – that, as Levelt put it, gesture is ‘functionally dead during, on average, 12 hours a day’. This is probably overstated. As Emmorey points out, modern signers get by without much problem, even in dim lighting, and can use tactile forms of signing – that is, signing in contact with another’s skin – in a pinch. Our prehistoric ancestors likely didn’t spend many waking hours in pitch-black. Rather, they would have used fires for warmth, cooking, illumination and protection from predators. And, whether or not there’s anything to Hewes’s palmar depigmentation story, hand movements can certainly be seen by firelight.
Another advantage of speech, it has been argued, is that it frees the hands for other activities. But, here again, this is likely too swift. Signers don’t seem to have much of a problem with this, using one-handed signs when necessary. And though most of what we think of as gesture takes place in the hands, critical visual signals can also be produced with the head and face – pointing, affirming, questioning. The list of putative advantages of speech goes still further. But this kind of ‘rear-view mirror’ reasoning is inherently shaky: we know the outcome and are motivated to explain it. As Tecumseh Fitch – the author of an authoritative survey of language evolution research – has pointed out, we can also readily come up with advantages of gesture. For instance, it is more discreet than speech; it can thus be used during hunting without alerting prey, or around the fire without alerting predators. It’s a more directed signal, in contrast to the broadcast nature of speech. It is better suited to noisy environments. It is more available when eating – in fact, speaking while eating poses a choking hazard. The scorecard begins to look like a wash.
But there is at least one key feature of speech that is harder to dismiss: it takes very little effort. Attempts to measure the caloric expenditures involved in speech report that they are essentially negligible. This is both because the movements involved are so tiny, and because spoken words often hitch a ride on our outgoing breath. (Speaking can be thought of as a way of upcycling an abundant waste-product – air – as it leaves the body.) This is not to say that gesturing or signing is an especially athletic or strenuous endeavour by any means, but it is certainly more expensive than talking – by an order of magnitude at least, Fitch estimates.
Could a small differential in calories really matter? Unlikely as it might seem, it could. Laziness is a remarkably powerful shaper of human behaviour – and of human communicative behaviour in particular. The study of language has identified few laws but one is known as the ‘principle of least effort’. It explains why the words we use the most are also the shortest; it explains why the signs of sign language become abbreviated over time; it explains why we use acronyms, nicknames and contractions. Even animals show this drive toward more efficient communication, and archaic humans would have been no exception.
It’s plausible then that the energetic efficiency of speech could have, over many generations, compelled a transition from gesture to vocalisation. This would supply the why. But what about the how? Via what route could a capacity for meaning-making that developed in one organ be transferred to another, halfway up the body? A possible explanation begins with an anatomical curiosity: in humans, hand and mouth are intimately coupled. There seems to be a hidden yet robust connection between these two body parts – an invisible bridge, if you like – that language could have traversed.
The first evidence for the coupling of hand and mouth begins early in life. In utero, babies often suck their thumbs. Shortly after birth, until around five months old, they exhibit what is called the ‘Babkin reflex’: press on their palms, and they protrude their tongues. Other evidence comes from the brain. It has long been known that neural areas that control mouth movements are oddly very close to those that control hand movements, leading some to propose a shared command circuitry for the two. Recent evidence suggests that these areas are not merely close but, in fact, integrated. Researchers detected a speech signal from neural activity in the ‘hand knob’, a region that – as the name suggests – was thought to be specialised for hand movements.
Beyond babies, and brains, the coupling of hand and mouth can also be observed in everyday behaviours. Darwin noted that ‘children learning to write often twist about their tongues as their fingers move, in a ridiculous fashion’. Work since has documented tongue protrusions in a wide range of activities, not just during writing and not just in kids. The US basketball star Michael Jordan was known for sticking out his tongue while dribbling or dunking the ball. Watch someone thread a needle or unlock their smartphone, and you might see the same. These ‘ridiculous’ behaviours are just one type of what are called ‘hand-mouth sympathies’. A more subtle type occurs when people use their hands while simultaneously producing a vocalisation. One-year-olds, for instance, vocalise differently when they grasp a small object versus large object.
Is this coupling of hand and mouth a recent addition to the roster of human quirks? Probably not. Chimpanzees show hand-mouth sympathies, too, suggesting that this ‘invisible bridge’ was very likely already present in our last common ancestor. Thus we have identified a plausible-enough route and a plausible-enough reason to leave. But this, of course, leaves open the question of what a journey along this bridge might have actually looked like.
Any details remain highly speculative, of course. But most scholars agree that a move from hand to mouth would have been very slow. And many now emphasise that the transition was not from a gesture-only system to a speech-only system. That would make little sense, given what language use actually looks like today: speakers gesture with their hands when they talk, and signers gesture with their mouths when they sign. Rather, the transition we seek to explain was from a system in which gestures served as the primary, foregrounded communicative channel, and vocalisation was a secondary channel – much as we see in modern chimpanzees – to the other way around. We want to explain, in other words, how speech gained the upper hand.
We need to explain why we felt an itch to mean anything at all
Proposed explanations have been few and, admittedly, fanciful. Writing in Science magazine in 1944, Richard Paget suggested that language began as manual gesture but was able to switch over to speech because the activities of the mouth unconsciously echoed the activities of the hand. He wrote that when ‘the principal actors (the hands) retired from the stage … their understudies – the tongue, lips, and jaw – were already proficient in the pantomimic art’. The idea was that spoken words first emerged as unconscious, audible echoes of manual gestures. This proposal has been largely forgotten and occasionally derided. In 2010, Fitch described it as ‘one of the more precarious edifices in the field’.
But, in the past decade, Paget’s ideas have attracted a second look. Partly this is because the evidence for hand-mouth sympathies continues to mount. And partly it is because sign language researchers have recently described a class of signs that show exactly the kind of process that Paget had in mind. The mouth is highly active during signing, as already noted. Sometimes signers are mouthing a spoken language equivalent of a sign; sometimes they are producing adverbs. But still other times their mouths are simply mirroring aspects of a concurrent manual sign – exhibiting what is now known as ‘echo phonology’. To take one example, the sign for ‘true’ in British Sign Language involves bringing one hand, positioned above, down onto the other; at the same time, the open mouth is closed. Thus, a movement of the hands is echoed in a much smaller movement of the mouth. If you were to add in a concurrent vocalisation, you could get a distinctive hand movement along with a distinctive sound.
It might seem wildly unlikely that language could hop from hand to mouth in this way. Much more work would be needed to flesh out the details of this ‘echoic’ route before we would want to declare it plausible. But this is just a starting point. A goal for the next phase of gestural theories is to develop and refine models of a gradual ‘miniaturisation’ process – of a long slog in which humans, by degrees, might have adopted a more compact form of communicative behaviour. At some point in this slog, the prominence of gesturing and of vocalising would have flipped, with speech becoming the foreground channel, and gesture remaining as a background channel. Leaving us, in other words, right where we are today.
Arriving where we are today is, in fact, a major strength of gestural theories. Ever since Hewes, gesture-first theorists have faced a stark burden of proof, of explaining the why and the how of modality transition. But speech-first theorists have their own burden, albeit rarely acknowledged. Any speech-first theory needs to account, not only for how speech could have evolved, but also for why and how speech came to be universally, ubiquitously and automatically intertwined with gesture. Call it the problem of ‘modality addition’. It might be tempting to chalk up our use of gesture to its communicative power, but researchers have found only modest support for this idea. Quaint as it seems, the 19th-century notion that speakers’ gestures are vestigial, like tailbones or goosebumps, might not be so far off.
None of this is to say that the gesture-first theory has prevailed – far from it. Rather, it is to say, first, that the allegedly ‘fatal flaw’ of gestural theories might prove, in the end, to be merely a flesh wound and, second, that speech-first theories have their own problems. Ultimately, questions about modality are just one layer of the larger puzzle of language origins. Even if we were able to establish some version of a gesture-first proposal as not merely plausible but likely, there would be many more layers to contend with. We would also want to understand how we came by our abilities to read other minds, to sequence and combine ideas, to conceptualise abstractions such as ‘tomorrow’ and ‘truth’. We would need to explain, not merely whether we first conveyed meaning by hand or by mouth, but why we felt an itch to mean anything at all. It is this multilayered complexity that makes the evolution of language the ‘hardest problem in science’ – and also one of the most tantalising.