From Le Petit Journal, 18 February 1912. Photo by Getty


From Le Petit Journal, 18 February 1912. Photo by Getty

The trolley problem problem

Are thoughts experiments experiments at all? Or something else? And do they help us think clearly about ethics or not?

James Wilson

From Le Petit Journal, 18 February 1912. Photo by Getty

James Wilson

is professor of philosophy at University College London. He is director of the MA in philosophy, politics and economics of health, co-director of the UCL Health Humanities Centre and co-director of the MA in health humanities. He lives in London.

Brought to you by Curio, an Aeon partner

3,400 words

Edited by Nigel Warburton

Syndicate this Essay

Aeon for Friends

Find out more

Much recent work in analytic philosophy pins its hopes on learning from imaginary cases. Starting from seminal contributions by philosophers such as Robert Nozick and Derek Parfit, this work champions the use of thought experiments – short hypothetical scenarios designed to probe or persuade on a point of ethical principle. Such scenarios are nearly always presented context-free, and are often wildly different from the everyday contexts in which ethical sensibilities are formed and exercised. Most famous (or infamous) among these are ‘trolley problems’ – thought experiments about the permissibility of causing the death of a smaller number of people to save a larger number from a runaway trolley (or train). But there are thousands more, with some papers containing as many as 10 separate cases.

While thought experiments are as old as philosophy itself, the weight placed on them in recent philosophy is distinctive. Even when scenarios are highly unrealistic, judgments about them are thought to have wide-ranging implications for what should be done in the real world. The assumption is that, if you can show that a point of ethical principle holds in one artfully designed case, however bizarre, then this tells us something significant. Many non-philosophers baulk at this suggestion. Consider ‘The Violinist’, a much-discussed case from Judith Jarvis Thomson’s 1971 defence of abortion:

You wake up in the morning and find yourself back-to-back in bed with an unconscious violinist. A famous unconscious violinist. He has been found to have a fatal kidney ailment, and the Society of Music Lovers has canvassed all the available medical records and found that you alone have the right blood type to help. They have therefore kidnapped you, and last night the violinist’s circulatory system was plugged into yours, so that your kidneys can be used to extract poisons from his blood as well as your own. The director of the hospital now tells you: ‘Look, we’re sorry the Society of Music Lovers did this to you – we would never have permitted it if we had known. But still, they did it, and the violinist now is plugged into you. To unplug you would be to kill him. But never mind, it’s only for nine months. By then he will have recovered from his ailment, and can safely be unplugged from you.’

Readers are supposed to judge that the violinist, despite having as much right to life as anyone else, doesn’t thereby have the right to use the body and organs of someone who hasn’t consented to this – even if this is the only way for him to remain alive. This is supposed to imply that, even if it is admitted that the foetus has a right to life, it doesn’t yet follow that it has a right to the means to survive where that involves the use of an unconsenting other’s body.

From the perspective of philosophers, the point here is clear, even if Thomson’s conclusion is controversial. In the few instances I tried to use this thought experiment in teaching ethics to clinicians, they mostly found it a bad and confusing example. Their problem is that they know too much. For them, the example is physiologically and institutionally implausible, and problematically vague in relevant details of what happened and how. (Why does the Society of Music Lovers have access to confidential medical records? Is the operation supposed to have taken place in hospital, or do they have their own private operating facility?) Moreover, clinicians find this thought experiment bizarre in its complete lack of attention to other plausible real-world alternatives, such as dialysis or transplant. As a result, excellent clinicians might fail to even see the analogy with pregnancy, let alone find it helpful in their ethical reasoning about abortion.

Faced with people who don’t ‘get’ a thought experiment, the temptation for philosophers is to say that these people aren’t sufficiently good at isolating what is ethically relevant. Obviously, such a response risks being self-serving, and tends to gloss over an important question: how should we determine what are the ethically relevant features of a situation? Why, for example, should a philosopher sitting in an armchair be in a better position to determine the ethically relevant features of ‘The Violinist’ than someone who’s worked with thousands of patients?

Although philosophers don’t often talk about this, it would appear that they assume that the interpretation of thought experiments should be subject to a convention of authoritative authorial ethical framing. In other words, the experiments are about what the author intends them to be and nothing else, much like Lewis Carroll’s Humpty Dumpty, who used words to mean whatever he wanted them to mean. To further spell out the implied convention, the author of the thought experiment has, by definition, specified all the ethically relevant elements of the case.

Thought-experiment designers often attempt to finesse the problem through an omniscient authorial voice that, at a glance, takes in and relates events in their essentials. The voice is able to say clearly and concisely what each of the thought experiment’s actors is able to do, their psychological states and intentions. The authorial voice will often stipulate that choices must be made from a short predefined menu, with no ability to alter the terms of the problem. For example, the reader might be presented with only two choices, as in the classic trolley problem: pull a lever, or don’t pull it.

All this makes reasoning about thought experiments strikingly unlike good ethical reasoning about real-life cases. In real life, the skill and creativity in ethical thinking about complex cases are in finding the right way of framing the problem. Imaginative ethical thinkers look beyond the small menu of obvious options to uncover novel approaches that better allow competing values to be reconciled. The more contextual knowledge and experience a thinker has, the more they have to draw on in coming to a wise decision.

Ethical thought experiments work best when those who read them are willing to go along with the arbitrary stipulations of the author. The greater one’s contextual expertise, the more likely one is to suffer the problem of ‘too much knowledge’ when faced with thought experiments stipulating facts and circumstances that make little sense given one’s domain-specific experience. So, while philosophers tend to assume that they make ethical choices clearer and more rigorous by moving them on to abstract and context-free territory, such gains are likely to be experienced as losses in clarity by those with relevant situational expertise.

It’s easy for such differences of perspective to turn into standoffs. Impasse looms where each side employs different standards of good reasoning, and criticises the other for failing to meet standards it wasn’t trying to meet. To make progress, it pays to understand why those with whom you disagree think that their views are cogent. What would the world need to be like for thought experiments to be a good way of making progress in ethics? I’ll canvass two suggestions: first that the thought experiment is a kind of scientific experiment, and second that it is an appeal to imagination. As we will see, on either reading, thought experiments are highly fallible, and we should be circumspect about taking them to provide insights into real-world ethical problems.

Some philosophers think that ethical thought experiments either are, or have a strong affinity with, scientific experiments. On such a view, thought experiments, like other experiments, when well-designed can allow knowledge to be built via rigorous and unbiased testing of hypotheses. Just as in the randomised controlled trials in which new pharmaceuticals are tested, the circumstances and the types of control in thought experiments could be such as to make the situation very unlike everyday situations, but that is a virtue rather than a vice, insofar as it allows ethical hypotheses to be tested cleanly and rigorously.

If thought experiments are – literally – experiments, this helps to explain how they might provide insights into the way the world is. But it would also mean that thought experiments would inherit the two methodological challenges that attend to experiments more generally, known as internal and external validity. Internal validity relates to the extent to which an experiment succeeds in providing an unbiased test of the variable or hypothesis in question. External validity relates to the extent to which the results in the controlled environment translate to other contexts, and in particular to our own. External validity is a major challenge, as the very features that make an environment controlled and suitable to obtain internal validity often make it problematically different from the uncontrolled environments in which interventions need to be applied.

There are significant challenges with both the internal and the external validity of thought experiments. It is useful to compare the kind of care with which medical researchers or psychologists design experiments – including validation of questionnaires, double-blinding of trials, placebo control, power calculations to determine the cohort size required and so on – with the typically rather more casual approach taken by philosophers. Until recently, there has been little systematic attempt within normative ethics to test variations of different phrasing of thought experiments, or to think about framing effects, or sample sizes; or the extent to which the results from the thought experiment are supposed to be universal or could be affected by variables such as gender, class or culture. A central ambiguity has been whether the implied readers of ethical thought experiments should be just anyone, or other philosophers; and, as a corollary, whether judgments elicited are supposed to be expert judgments, or the judgments of ordinary human beings. As the vast majority of ethical thought experiments in fact remain confined to academic journals, and are tested only informally on other philosophers, de facto they are tested only on those with expertise in the construction of ethical theories, rather than more generally representative samples or those with expertise in the contexts that the thought experiments purport to describe.

The problems of external validity are even greater. The crucial question is: even assuming that a thought experiment has internal validity, what follows from the validity of judgments in the world of the thought experiment for other cases? If you agree that it would be permissible to pull the lever in the original trolley problem, causing five people to be saved and one to die, there are a variety of inferences that could follow. At the most confined, we could take it that the result has implications only for cases involving runaway trains with particular switching arrangements. At the other end of the spectrum, we could take the result to have far-reaching implications about the permissibility of causing harm to some in the course of preventing harm to greater numbers of others. Judges within the common law tradition face a structurally similar question when making a judgment. They need to supply reasoning to support their decision, parts of which can be filleted out as the ratio decidendi (reason for the decision) by future judges. The ratio gives the judge’s best approximation to the breadth of the precedent the case sets.

The broader the precedents that thought experiments can set, the more powerful they will be for ethical thinking. In turn, the breadth of the precedents that a thought experiment sets depends on the degree to which the controls in place in the thought experiment, which allow the particular hypothesis to be tested cleanly, imply or are compatible with the wider cogency of the resultant ethical principle. This is not straightforward, and is itself a frequent topic for contestation.

It’s not hard to think of a pair of cases where killing and letting die are not morally equivalent

Some philosophers think that well-controlled thought experiments allow wide-ranging implications to be drawn. In 1975, the philosopher James Rachels constructed a pair of parallel cases involving a relative intending to kill his young cousin to gain an inheritance, in order to show that there is no intrinsic difference between killing and letting die.

In Rachels’s first case, Smith kills his cousin by drowning him in the bath, and makes it look like an accident. Meanwhile, Jones intends to drown his cousin and make it look like an accident; he sneaks into the bathroom to do precisely this, but by coincidence the boy slips, hits his head, falls face-down into the water and drowns of his own accord. Rachels argues that killing the cousin and letting him die are morally equivalent; thus, if in these two otherwise identical cases there is no ethical difference between killing and letting die, then there is no intrinsic difference between the two. This is supposed to carry over to the world of real ethical choices and those that potentially influence policy. But does it?

It is now widely argued that such inferences – from a simplified thought experiment to a real-life situation – are unsafe. Context will sometimes or often make a difference, and there is no algorithmic way of working out what this difference will be in advance. It’s not hard, for example, to think of a precisely matched pair of cases where killing and letting die are not morally equivalent. Had the context been one in which a hitman was preparing to take a hidden shot at a target, and the target then died of a sudden cardiac arrest as the hitman remained out of sight, it’s far from clear that killing and letting die would be equally bad.

The deeper question about external validity is whether thought experiments give insights into a single fixed picture that can gradually be reconstructed, or whether even well-designed thought experiments inform something more fragmentary, changeable and plural. Societies differ greatly in features such as wealth, inequality, population size, ethnic, linguistic and religious diversity, technological advancement, economic structure, ease of communication and travel, and the ability to collect taxes and maintain order without violence. Moreover, societies are continually shifting in terms of these structural variables, and sometimes rapidly, for example through processes of industrialisation or transition away from communism. The COVID-19 outbreak has vividly displayed the ways in which social norms and structures are more malleable than we assume.

It is implausible to think that the actual optimal policy prescriptions would be the same, regardless of the societal context. It is less clear whether, despite this multidimensional variety, it is better to hold fast to the conviction that there are global and unchanging ethical principles to be discovered, or if it would be better to start from the assumption that ethical principles arise from attempts to solve problems in living together, and should be assumed to be at least somewhat local and changeable as these conditions change. One reason to doubt that correct ethical principles are unchanging is that many seemingly vital ethical questions are decidedly recent, and would have been barely intelligible to those living 100 ago – questions such as individual responsibility to prevent climate change, gender self-identification, the nature of authenticity under surveillance capitalism, and the governance of AI-based automated decision-making.

Many philosophers nonetheless wish to say that the correct ethical principles are unchanging. However, even if this were true, I suspect the principles wouldn’t be specific enough to provide useful advice, and the real work of ethical thinking would be in interpreting or specifying these principles. Compare a case where you go to someone for advice, and it transpires that you got exactly the same advice as everyone else, regardless of the specifics of your position.

An alternate view of thought experiments would downplay their relationship to scientific experiments, and acknowledge that they are, as Daniel Dennett put it, ‘intuition pumps’: tools for persuasion via imaginative consideration of possibilities. Thinking of thought experiments as persuasive fictions wouldn’t obviate the problem of external validity, but might allow us to reframe it.

Aristotle provides one way of thinking through how fiction can provide ethical insights, arguing that tragic drama is more ‘philosophical and more serious than history’, as it speaks of universals, while history speaks only of particulars. History will tell us what actually happened, but this is often unsatisfying and random. Lives as we live them, and events as they unfold, often don’t make sense – but it is precisely this kind of sense-making and feeling of necessity that makes stories resonate universally; and this comes from rational construction. Dramatists and novelists tend to condense and leave out elements that are irrelevant to the kind of stories they want to tell. As the author Iris Murdoch argued in 1970, when fiction works well:

We are presented with a truthful image of the human condition in a form which can be steadily contemplated; and indeed this is the only context in which many of us are capable of contemplating it at all.

The idea that fictions can provide ethical insights seems correct; but it doesn’t follow that they do so reliably or in a way that allows ethical insights to be easily transported from one context to another. One important question is what the relationship is between a well-told story and one that is true, or ethically insightful. The screenwriter William Goldman in Adventures in the Screen Trade (1983) discusses how one might approach writing a movie in which the main character had to get in the same room as the most famous woman in the world. Probably you’d write it as a classic heist film, with the first half devoted to the mastermind devising the plan and assembling the team – no doubt involving a confidence trickster, an electronics expert to defeat security systems and a getaway driver. The second half would see the plan unfold and things go wrong, and then any necessary adjustments.

How things are presented in fiction is often simplified and distorted

Goldman then compares this notion with how in fact Michael Fagan entered the Queen’s bedroom in 1982. The man hopped over the palace railings and, via a series of accidents and attendants failing to notice alarms, walked through the royal stamp collection, shinned up a drainpipe, and took off his sandals and socks to climb through an open window. Once inside the palace, Fagan wandered around unchallenged for 15 minutes in bare feet, before finding himself in the Queen’s bedroom. To this day, it’s unclear why he wanted to do this. As Goldman put it: ‘true as it may be, if you handed it in as a screenplay, you would find yourself thrown out without ceremony as a very uninventive writer of fantasy’.

Whether in police work, emergency medicine or war, how things are presented in fiction is often simplified and distorted, to a point where it might be just too annoying to watch if the drama focuses on your area of expertise. For example, resuscitation (CPR) is much more likely to succeed in TV dramas than in real life. As the public health scholar Jaclyn Portanova and her colleagues found in 2015, nearly 70 per cent of CPR attempts in TV dramas succeeded, with 50 per cent of patients surviving to be discharged. In reality, the rate of successful discharge after CPR in US hospitals is 25 per cent. So using fiction as a means for ethical reflection – whether in thought experiments or in novels – will tend to raise the same questions of experience, abstraction and ‘too much knowledge’ that we considered earlier in discussing Thomson’s violinist.

In some ways, this criticism is as old as philosophical reflection on art. In his Republic, Plato complained that poets knew nothing about the things they wrote about, whether war or shoemaking, but presented images that others equally as ignorant would find convincing. The criticism could apply not just to TV dramas, but to thought experiments too.

Overall, ethical thought experiments are, at best, fallible ways of constructing simplified models that map rather imperfectly onto the world as we experience it, and can distort as much as they illuminate. So should we give up on them as sources of ethical insight?

Responsible thinking requires calibrating our levels of credence to the reliability of our intellectual tools. Clearly, ethical thought experiments are not particularly reliable tools. But that’s not to say that we have other, more reliable tools. Pre-theoretical ethical ‘common sense’ is subject to distortions brought by prejudice, power and many other factors, and the reason why we turn to philosophical ethics in the first place is that it’s unclear how to resolve competing ethical duties that arise at a pretheoretical level. Ethical thinking is hard, and even our best tools for doing it are not very good. Humility should be the watchword.

James Wilson

is professor of philosophy at University College London. He is director of the MA in philosophy, politics and economics of health, co-director of the UCL Health Humanities Centre and co-director of the MA in health humanities. He lives in London.
Syndicate this Essay
Aeon is not-for-profit
and free for everyone
Make a donation
Get Aeon straight
to your inbox
Join our newsletter

Charles Boyer plays opposite Ingrid Bergman in the 1944 film adaptation of Patrick Hamilton’s novel Gaslight. Photo by Getty

Mental health
Turn off the gaslight

The skilled manipulator casts a shadow of doubt over everything that you feel or think. Therapy can bring the daylight in

Ramani Durvasula

Districts like the largely Latino Mission District in San Francisco have experienced the effects of gentrification with fast-rising housing costs and the eviction of longtime tenants. 9 May 2015. Photo by Preston Gannaway/New York Times

The harms of gentrification

The exclusion of poorer people from their own neighbourhoods is not just a social problem but a philosophical one

Daniel Putnam