A mathematical BS detector can boost the wisdom of crowds

A few weeks ago, I thought I’d do a demonstration of the wisdom of crowds. I was at a bat mitzvah reception and, as a game, the hosts asked each table to guess the number of Skittles in a big Tupperware bowl. I got everyone at our table to write down a guess and then averaged the results. Based on what social scientists have been saying, our collective answer should have been spot-on. Each of us had a vague hunch about how to pack small objects into big boxes, subject to much uncertainty. Taken together, though, our scraps of erudition should have accumulated while the individual errors cancelled out. But my experiment was an abject failure. Our estimate was off by a factor of two. Another table won the cool blinking necklace.

Wisdom of crowds is an old concept. It goes back to Ancient Greek and, later, Enlightenment thinkers who argued that democracy is not just a nice idea, but a mathematically proven way to make good decisions. Even a citizenry of knaves collectively outperforms the shrewdest monarch, according to this proposition. What the knaves lack in personal knowledge, they make up for in diversity. In the 1990s, crowd wisdom became a pop-culture obsession, providing a rationale for wikis, crowdsourcing, prediction markets and popularity-based search algorithms.

That endorsement came with a big caveat, however: even proponents admitted that crowds are as apt to be witless as well as wise. The good democrats of Athens marched into a ruinous war with Sparta. French Revolutionary mobs killed the Enlightenment. In the years leading up to 2008, the herd of Wall Street forgot the most basic principles of risk management. Then there was my little Skittles contest. It was precisely the type of problem that crowds are supposed to do well on: a quiet pooling of diverse and independent assessments, without any group discussion that a single person might dominate. Nevertheless, my crowd failed.

Dražen Prelec, a behavioural economist at the Massachusetts Institute of Technology (MIT), is working on a way to smarten up the hive mind. One reason that crowds mess up, he notes, is the hegemony of common knowledge. Even when people make independent judgments, they might be working off the same information. When you average everyone’s judgments, information that is known to all gets counted repeatedly, once for each person, which gives it more significance than it deserves and drowns out diverse sources of knowledge. In the end, the lowest common denominator dominates. It’s a common scourge in social settings: think of dinner conversations that consist of people repeating to one another the things they all read in The New York Times.

In many scientific disputes, too, the consensus viewpoint rests on a much slenderer base of knowledge than it might appear. For instance, in the 1920s and ’30s, physicists intensely debated how to interpret quantum mechanics, and for decades thereafter textbooks recorded the dispute as a lopsided battle between Albert Einstein, fighting a lonely rearguard action against the new theory, and everyone else. In fact, ‘everyone else’ was recycling the same arguments made by Niels Bohr and Werner Heisenberg, while Einstein was backed up by Erwin Schrödinger. What looked like one versus many was really two on two. Very little fresh knowledge entered the discussion until the 1960s. Even today, Bohr and Heisenberg’s view (the so-called Copenhagen interpretation) is considered the standard one, a privileged status it never deserved.

Prelec started from the premise that some people’s judgments deserve greater weight than others. By no longer averaging everyone’s judgments equally, you can avoid overcounting redundant or otherwise extraneous information. You already do this all the time, whenever you trust opinions that are expressed with confidence and spurn diffident-sounding ones. There’s something to be said for that kind of trust. In psychology experiments, people who are more accurate at a task – say, remembering a list of words – tend to express more confidence. Unfortunately, the converse isn’t true: confident people aren’t necessarily more accurate. As W B Yeats wrote: ‘The best lack all conviction, while the worst are full of passionate intensity.’ Also, people systematically overestimate the value of their knowledge. A rule of thumb is that 100 per cent confidence means you’re right 70 to 85 per cent of the time. What is needed is a better way to measure the value of a person’s knowledge before putting it into the wisdom-of-crowds mix.

The solution, Prelec suggests, is to weight answers not by confidence but by metaknowledge: knowledge about knowledge. Metaknowledge means you are aware of what you know or don’t know, and of where your level of knowledge stands in relation to other people’s. That’s a useful measure of your value to the crowd, because knowledge and metaknowledge usually go together. ‘Expertise implies not only knowledge of a subject matter but knowledge of how knowledge of that subject matter is produced,’ says Aaron Bentley, a graduate student at the City University of New York Graduate Center who studies social cognition.

Whereas you might have no independent way to verify people’s knowledge, you can confirm their metaknowledge. In a forthcoming paper, Prelec, his graduate student John McCoy and the neuroscientist Sebastian Seung of Princeton University spell out the procedure. When you take a survey, ask people for two numbers: their own best guess of the answer (the ‘response’) and also their assessment of how many people they think will agree with them (the ‘prediction’). The response represents their knowledge, the prediction their metaknowledge. After you have collected everyone’s responses, you can compare their metaknowledge predictions to the group’s averaged knowledge. That provides a concrete measure: people who provided the most accurate predictions – who displayed the most self-awareness and most accurate perception of others – are the ones to trust.

Metaknowledge functions as a powerful bullshit detector. It can separate crowd members who actually know something from those who are guessing wildly or just parroting what everyone else says. ‘The crowd community has been insufficiently ambitious in what it tries to extract from the crowd,’ Prelec says. ‘The crowd is wise, but not in the way the error-correcting intuition assumed. There’s more information there.’ The bullshit detector isn’t perfect, but it’s the best you can do whenever you don’t know the answer yourself and have to rely on other people’s opinion. Which eyewitness do you believe? Which talking head on TV? Which scientist commenting on some controversial topic? If they demonstrate superior metaknowledge, you can take that as a sign of their superior knowledge.

There are three distinct ways that metaknowledge can improve crowd wisdom. First, it provides a powerful consistency check on survey data. Sociologists have long relied on a version of this approach, asking people not just what they know but what they think other people know. In so doing, the researchers can gauge the prevalence of beliefs and activities that people won’t admit, even to themselves. It’s suspicious whenever people say an activity is common but claim they’d never – never! – do it themselves. For instance, if people deny liking Barry Manilow but say that their peers will express great enthusiasm for him, you can conclude that Manilow is more popular than people are letting on. Likewise, you should beware of politicians who rail against rampant corruption; they doth protest too much. Truly innocent people are inclined to think the best of others.

The tactic works because our metaknowledge is skewed. When we are asked to predict other people’s responses, we base our prediction largely on our own response, thereby divulging information that we might hide if we were asked for it directly. People’s tendency to assume that others think as they do is a basic psychological bias known as the false-consensus effect. As a test, this past spring the physicist Anthony Aguirre of the University of California, Santa Cruz, and I did an experiment on the Metaculus prediction market. We asked people how likely they thought it was that Bernie Sanders would become the Democratic presidential nominee; at the same time, we also solicited their prediction of what their fellow respondents would say. When we plotted people’s own responses against their predictions, the points fell roughly along a straight line. Those who put the probability at 10 per cent thought that the crowd would say about 10 per cent; those who said 20 per cent thought the crowd would say about 20 per cent; and so on. The two responses tracked each other, just as you’d expect from the false-consensus effect.

Like many cognitive biases, this one is not necessarily illogical. When evaluating what others are wont to do, it is reasonable to start with ourselves as a reference point, as long as it doesn’t become our only reference point. Regardless of the underlying psychological mechanism, asking people to predict how others will respond can do wonders for improving the accuracy of surveys. Because the false-consensus effect is a failure of our metaknowledge, it makes people with good metaknowledge stand out all the more.

The second way in which metaknowledge can be valuable is not as a lie detector but as truth serum. By probing metaknowledge, a pollster can create a strong incentive for survey respondents to answer questions candidly. In 2012, Prelec, together with the psychologist Leslie John at Harvard University and the economist George Loewenstein at Carnegie Mellon University, applied a metaknowledge-based truth algorithm to a survey of more than 2,000 academic psychologists about research malfeasance. It’s clearly a slippery topic to study: who would ever voluntarily ’fess up to fabricating data? Prelec and his colleagues sought to break down this reluctance. They offered to make a charitable contribution on respondents’ behalf as a reward for completing the survey. They told one-third of the respondents – the control group – that the contribution would be a fixed amount. They told the other two-thirds that the contribution would depend on how truthfully the respondents answered, as judged by answers to the metaknowledge questions.

The devils had better metaknowledge: they were more attuned than the angels to what was going on in their community

Prelec and his colleagues couldn’t tell whether any given respondent had told the truth, but they structured the survey so that respondents would collectively maximise the charitable contribution by answering honestly. The effect was dramatic. Nearly three times as many respondents given this incentive – 1.7 per cent versus 0.6 per cent in the control group – admitted to having faked data. That’s just a handful of people, which might make you worry about the statistical significance of the difference, but a similar pattern showed up for several more widespread, if lesser, sins. There was also an evident false-consensus effect at work. Self-confessed fabricators thought 26 per cent of their fellows were equally errant, whereas those who professed innocence put the figure at 9.5 per cent.

Next, Prelec and his team went a step further and sought to get at the true prevalence of data falsification. They had no independent way of knowing that, but could take a guess based on metaknowledge. They asked both subgroups how many guilty people would own up to their misdeeds. Self-proclaimed innocents thought that 4 per cent would. Given their estimate of the prevalence (9.5 per cent), that meant they predicted a total of 0.38 per cent of the survey respondents would admit to fakery. In contrast, those who admitted guilt put the predicted confession rate at 8.9 per cent. Given their estimated prevalence of fakery (26 per cent), that implied an overall admission rate of 2.4 per cent – much closer to the figure of 1.7 per cent reported by the full incentivised group.

So the devils had better metaknowledge. They were more attuned than the angels to what was going on in their community, and their 1.7 per cent number is probably the more accurate one. When the guilty parties tell us that scientific misconduct is widespread, it behooves us to listen.

That brings us to the third and most impressive application of metaknowledge: it can screen out people who don’t know what they are talking about, leaving those who have real information to contribute. As a simple, controlled test, Prelec and Seung gave a quiz to 51 MIT and 32 Princeton undergraduate students. For each of the 50 US states, the researchers listed the largest city in that state and asked the students whether it was the capital or not. They also asked the students what fraction of their classmates would agree with their choice. The average MIT respondent got 30 states right; the average Princetonian, 31. (You can try it yourself here.) Since the correct answers are already known, this kind of experiment makes it easy to evaluate how well the metaknowledge worked.

As hypothesised, those students who got the right answer also did better at predicting what the other respondents would say. For instance, 60 per cent of the students thought Chicago was the capital of Illinois, and they thought that 89 per cent would concur. The remainder realised that Chicago is the wrong answer, but recognised that only 30 per cent would know it. The second group’s predicted split of 70-30 was closer to the actual split of 60-40. Evidently, most people who think Chicago is the capital of Illinois had a hard time imagining any other possibility, whereas those who know Springfield is the real capital realised they were a minority – perhaps because they once made the same mistake themselves, perhaps because they are locals who are accustomed to clueless out-of-staters. Either way, their superior metaknowledge proved that they had higher-quality knowledge, too. ‘They know more facts, and know that some others are missing these facts,’ Prelec said.

The technique also works when the majority is right. Consider South Carolina. Is Columbia its capital? Some 64 per cent of respondents said yes, and that subgroup predicted that 64 per cent would go along. The remainder said no, and they predicted that 36 per cent would agree with them. This time, both groups did equally well at predicting the vote split, so there was no reason to doubt majority opinion.

A large majority was always right. A strong consensus is the closest proxy to truth that we have

When you average the quiz answers, discarding the respondents with low-quality metaknowledge, the MIT group got 41 states right, and the Princeton group 44 – a considerable improvement over the unweighted responses. The technique wasn’t perfect: it switched several right answers to wrong ones, but those reversals were rarer than the corrections. The metaknowledge adjustment also sifted experts from mere contrarians. Those who tended to vote no by default were more accurate overall, because only 17 state capitals are the largest cities, but their poorer metaknowledge betrayed them as reflexive naysayers rather than geography buffs.

Interestingly, the metaknowledge correction influenced the outcome only when the majority had less than about 70 per cent of the vote. A large majority was always right. This is a useful lesson for real-world disputes. A strong consensus is the closest proxy to truth that we have. The lone wolf who knows better than the misguided masses is much rarer than Hollywood movies would lead you to believe.

The usefulness of metaknowledge for improving the wisdom of crowds inspired the psychologist Jack Soll at Duke University in North Carolina and his graduate student Asa Palley to come up with a technique that is similar to Prelec’s but easier to apply. It differs by asking people for their prediction of the group’s average response as opposed to asking them to predict the percentage of people who agree with them, thereby streamlining the process, especially when answers are not multiple-choice but lie on a continuum.

To give a somewhat artificial example, suppose you ask your friends how much the US economy will grow this year. Everybody in the group reads The New York Times, which has been reporting a value of 4 per cent. Those who read only the Times will generally predict that everyone else will adopt this guess, too. But half of the group also reads The Economist and its grimmer forecast of zero growth. That subgroup splits the difference between its two sources of information, and estimates growth of 2 per cent. But, aware that fewer people read The Economist, the more-informed set predicts that the group as a whole will guess 3 per cent.

On average, then, the crowd as a whole estimates 3 per cent growth and predicts it will guess 3.5 per cent. The discrepancy between these two values tells you that the group is suffering from a shared-information bias that pulls up the growth estimate by at least half a percentage point. You can correct for it ‘pivoting’ to the other side of the response and guessing a growth rate of 2.5 per cent, a number that more plausibly reflects the total input of information.

Experts are more likely to recognise that other people will disagree with them. Novices betray themselves by being unable to fathom any position other than their own

Palley and Soll’s technique helps to make sense of the results from the Sanders experiment that Aguirre and I did. We found that respondents systematically overestimated the group’s average assessment of Sanders’s chances of getting the nomination. As in the above Times-Economist example, there was a mismatch between what the group believed to be true (on average) and what the group believed the rest of the people would say, a discrepancy that suggested a shared-information bias. Evidently, most people were basing their answers on widely disseminated information, whereas a diligent minority drew on more diverse sources that made them more dubious about Sanders’s prospects. We observed a similar pattern in a second poll we did on Brexit.

Encouraged by these results, Prelec, Soll and the other researchers hope to incorporate metaknowledge into political and economic forecasts of all sorts, drawing on crowd wisdom that in the past was often too biased to be meaningful. To take advantage of these principles yourself, you don’t have to do a formal survey; just pay attention to the metaknowledge exhibited by people around you. Experts are more likely to recognise that other people will disagree with them, and they should be able to represent other points of view even if they don’t agree. Novices betray themselves by being unable to fathom any position other than their own. Likewise, you can monitor your own metaknowledge by noting when you find that more people hold a certain belief than you expected. ‘If you find yourself surprised by how many people disagree with you, relative to your committed prior predictions of disagreement, that’s an indication that you are a novice in the domain,’ Prelec says. That doesn’t automatically make you wrong, but does suggest you should take another look at your beliefs.

Using metaknowledge as a bullshit detector also offers guidance on controversial issues such as climate change. In my experience, people who doubt climate change and its human causes tend to be extremely sure of themselves (even though one of their critiques is that the climate system is too complex to model accurately). Mainstream climate scientists, on the other hand, acknowledge that they might be wrong by placing error bars around their best guesses. For example, in a survey in the mid-1990s, 16 climate scientists offered estimates of the temperature rise for a given level of carbon dioxide. Fifteen of them gave fairly wide error bars that reflected the uncertainties of their science. One researcher stood apart, giving an estimate with hardly any error bar at all, indicating near-absolute certainty. That person was later identified as one of the most outspoken climate skeptics.

If you happen to be a climate skeptic yourself, you were probably irritated by the last paragraph. Now stop and ask yourself, what makes you so sure about your point of view? If you do think the climate is warming as a result of human activity, don’t get smug, either. You, too, should pause and reflect on the source of your certainty – are you really any more knowledgable about the science than a skeptic is? Put another way, what is the state of your metaknowledge? We’d all be better off if the phrase ‘I’m confident that’ were banned from public discourse and we listened more to those who openly recognised the limits of their own knowledge.

Good metaknowledge is precious. It requires not only that you know a subject but also that you know yourself. And self-knowledge is the most difficult knowledge of all.

Social psychology Knowledge Self-improvement

6 July 2016

Post

SYNDICATE THIS ESSAY