Unknown Soviet film courtesy


Strange continuity

Throughout evolutionary history, we never saw anything like a montage. So why do we hardly notice the cuts in movies?

by Jeffrey M Zacks + BIO

Unknown Soviet film courtesy

Suppose you were sitting at home, relaxing on a sofa with your dog, when suddenly your visual image of the dog gave way to that of a steaming bowl of noodles. You might find that odd, no? Now suppose that not just the dog changed, but the sofa too. Suppose everything in your visual field changed instantaneously in front of your eyes.

Imagine further that you were in a crowd and exactly the same thing was happening to everyone around you, at exactly the same time. Wouldn’t that be disturbing? Kafkaesque? In 1895 in Paris, exactly this started happening – first to a few dozen people, then to hundreds and then thousands. Like many fin-de-siècle trends, it jumped quickly from Europe to the United States. By 1903, it was happening to millions of people all over the world. What was going on? An epidemic of an obscure neurological disorder? Poisoning? Witchcraft?

Not quite, though it was definitely something unnatural. Movies are, for the most part, made up of short runs of continuous action, called shots, spliced together with cuts. With a cut, a filmmaker can instantaneously replace most of what is available in your visual field with completely different stuff. This is something that never happened in the 3.5 billion years or so that it took our visual systems to develop. You might think, then, that cutting might cause something of a disturbance when it first appeared. And yet nothing in contemporary reports suggests that it did.

Articles from the time describe the vivid impressions of motion and depth that film produced – you might have heard the story about viewers sitting down to watch the Lumière brothers’ The Arrival of a Train at La Ciotat Station (1895) and running terrified from the theatre. (Incidentally, that story is probably apocryphal, according to a 2004 report by Martin Loiperdinger of Trier University, translated by Bernd Elzer.) Other avant-garde aesthetic techniques of the time excited a furious response: think of the riot in 1913 at the premiere of Igor Stravinsky’s Rite of Spring, or – closer to the phenomenon we’re interested in – the challenge that stream-of-consciousness fiction is still felt to pose to readers.

Yet the first cinemagoers seem to have taken little note of cuts. Something that, on the face of it, ought to seem discontinuous with ordinary experience in the most literal sense possible slipped into the popular imagination quite seamlessly. How could that be?

In the earliest films, cuts were made by stopping the camera, setting up a new scene, and then rolling again. Filmmakers quickly realised that that they could select shots and edit them together as they chose by physically cutting the film and splicing it with tape. They used this technique to produce fantastical special effects, to construct stories with scene changes that could not fit into a single location, and to exploit changes in camera angle such as the close-up and the point-of-view shot.

Cuts quickly became ubiquitous. In fact, the pace at which they were thrown at viewers increased dramatically. The psychologists James Cutting and Ayse Candan at Cornell University have analysed the trends, which are fascinating. The earliest edited films tended to have a mixture of longer and shorter shots, with a mean length of around 10 seconds. By 1927, that had been cut by more than a half. When sound was introduced, shots became much longer again: mean shot-lengths jumped up to about 16 seconds and then immediately started drifting back down again. Today there is a wide variety of editing styles within popular filmmaking, but overall mean shot lengths have shrunk even shorter than at the end of the silent era, and in action films it is not unusual to have long sequences in which there is a cut every second or so.

What if we could go back in time and collect the reactions of naïve viewers on their very first experience with film editing?

Cutting and Candan suggest that shots are getting shorter by a process akin to biological evolution: filmmakers vary in the shot lengths that they prefer, which leads to a distribution of styles that include shots of various lengths. Ticket-buyers act as a selection mechanism: if they tend to reward movies with a shorter distribution of shots, then movies with shorter shots will tend to be emulated. (It’s worth comparing the development of film editing with the development of perspective techniques in painting. Film editing seems have evolved much more quickly. My hunch is that this happened because the invention of movies coincided with mass duplication and dissemination. With huge audiences, selection pressures can operate much more quickly.)

Clearly, viewers were quickly able to come to terms with film editing. But maybe the history and the contemporary reports missed the action because it happened too fast. Perhaps people were wigged out on the very first viewing but quickly learned how to ‘read’ film editing. What if we could go back in time and collect the reactions of naïve viewers on their very first experience with film editing?

It turns out that we can, sort of. There are a decent number of people on the planet who still don’t have TVs, and the psychologists Sermin Ildirar at Birkbeck, University of London, and Stephan Schwan at the Knowledge Media Research Centre in Tübingen, Germany, have capitalised on their existence to ask how first-time viewers experience cuts. The researchers lugged a laptop up a mountain to remote villages in Turkey with no TVs to test people’s responses to film viewing. First, Schwan and Ildirar painstakingly made short films that depicted local actors in everyday situations, so that the content of the movies wouldn’t be confusing or distracting. Then they presented these movies to naïve viewers and asked them what they saw.

No heads exploded. There was no evidence that the viewers found cuts in the films to be shocking or incomprehensible. For some kinds of sequences, they did have a harder time understanding how the different shots were intended to fit together. For example, parallel action, in which two simultaneous action streams are shown by cutting back and forth between them, gave them trouble. But cuts themselves were fine.

What is going on here? Consider that our visual systems evolved over hundreds of millions of years, while film editing has been around only for a little more than 100 years. Despite this, new audiences appear to be able to assimilate splices on more or less the first try. I think the explanation is that, although we don’t think of our visual experience as being chopped up like a Paul Greengrass fight sequence, actually it is.

Simply put, visual perception is much jerkier than we realise. First, we blink. Blinks happen every couple of seconds, and when they do we are blind for a couple of tenths of a second. Second, we move our eyes. Want to have a little fun? Take a close-up selfie video of your eyeball while you watch a minute’s worth of a movie on your computer or TV. You’ll see your eyeball jerking around two or three times every second. It turns out that most of the eye movements we make are these jerky, ballistic movements called saccades. They take a little less than a tenth of a second and, while the eye is moving, the information that it is sending to your brain is pretty much garbage. Your brain has a nifty control mechanism that turns down the gain during these saccades so that you ignore the bad information. Between blinks and saccades, we are functionally blind about a third of our waking life.

Worse yet, even when your eyes are open, they are recording a lot less of the world than you realise. The reason your eyes make saccades a couple of times a second is that you have high-resolution sensors only in the very middle of your visual field, in a region called the fovea. If you hold out your two thumbs together at arms’ length and look at them, the width of your two thumbs just about covers up the region represented by your fovea. Staring at your thumbs, they should appear sharp and detailed. Now, if you try to concentrate on an object out in the periphery, you will realise that your image of that object is pretty fuzzy.

plates changed colour, a scarf came and went, foods and drinks were substituted: viewers were oblivious

So, the signal that our brains are getting about the visual world is not like a smooth camera-pan around the environment. It’s more like a jittery music video: a sequence of brief shots of little patches of the world, stitched together. We feel like we have a detailed, continuous permanent representation of the visual details of our world, but what our visual system really delivers is a sequence of patchy pictures. Our brains do a lot of work to fill in the gaps, which can produce some pretty striking – and entertaining – errors of perception and memory.

One is change blindness. In Hollywood, script editors work furiously to keep these sorts of inconsistencies out of movies, but they are by no means perfect; a feature film can contain hundreds of glitches, as attested by sites such as But what happens when you leave them in on purpose? In 1997, the cognitive psychologists Daniel Levin and Daniel Simons, then graduate students at Cornell, constructed a short film of a lunch conversation, in which at each cut all sorts of objects were altered: plates changed colour, a scarf came and went, foods and drinks were substituted. Viewers were oblivious.

The fact that these errors are so hard to detect gives us a hint about how the brain handles cuts: perhaps the reason they are processed so easily by first-time viewers is that the brain is always knitting together successive views of a scene to construct a coherent representation of the world. The technical term for that representation is an event model, and a good event model captures the information about the scene that is important for guiding your behaviour and making predictions about what might happen next.

Take this famous scene from The Wizard of Oz (1939). When we see Dorothy arrive in Oz, we probably need a representation of the house, the witch’s legs sticking out, and Dorothy agog. Do we need to track whether those are flowers or lollipops in the background? Perhaps not. Do we need to accurately record how many Munchkins appear? Not likely. Our models are optimised to represent the information that is important for our comprehension of the activity. If the current shot has stuff that is inconsistent with what was in the last shot, we tend to go with what we currently see.

That makes good evolutionary sense, doesn’t it? If your memory conflicts with what is in front of your eyeballs, the chances are it is your memory that is at fault. So, most of the time your brain is stitching together a succession of views into a coherent event model, and it can handle cuts the same way it handles disruptions such as blinks and saccades in the real world.

There is, however, one situation in which stitching a new view in with the previous one is a bad idea: when the new view represents a transition from one event to another. Suppose you are sitting in a café with a friend. As the two of you chat, you need to integrate successive views of where they are and what expressions they are making, along with successive chunks of speech. But then you pay and leave. At this point, what you really need to do is to let the old event model go and form a new one that tracks the people, cars and bikes out on the street. This is perhaps even truer when watching a movie, because the film could cut from a café conversation to a completely different part of the world at a different time. So at these significant event boundaries, your visual system should do less bridging.

A few years ago, this line of thinking led my colleague Joseph Magliano, a psychologist at Northern Illinois University, and I to make a hypothesis about how the brain processes film editing. We bet that parts of the visual system would function as ‘neural spackle’, smoothing over the cracks between successive shots at cuts. These areas should be especially active at cuts in movies because they would be responding to the new view by madly integrating information into the current event model. But suppose that the brain also has a control process that shuts down this integration when a new event begins? In movies, these are basically boundaries between scenes. If that’s right, then cuts that correspond to scene breaks should not show increases in these areas.

Where might such areas be located? When visual information first enters your brain via the optic nerve, it lands in a fold in the back of your brain called the primary visual cortex. This brain region generates a map of your visual world that is relatively high-resolution but not very sophisticated – it is the closest thing you have to a picture in your head. From there, the information feeds forward to regions that form successively more abstract and specialised representations. Some areas are specialised for colour information, others for shape, others still for motion. At the highest levels is a set of areas that specialises in identifying objects and other things, and another set of areas that specialises in telling your muscles how to move to interact with the things you are seeing – how to grasp objects, and walk so as to avoid obstacles. Our hunch was that the areas involved in integrating information across cuts would be located in the middle of this pathway – not the very earliest representations, but not the ones that had already parsed the world into objects and actions.

To test this idea, we looked at data from an experiment we had conducted in my laboratory at Washington University using functional MRI. This kind of MRI allows us to measure changes in brain activity on the spatial scale of a few millimetres and on the temporal scale of a few seconds. In this experiment, viewers lay in the MRI scanner and watched The Red Balloon (1956) by Albert Lamorisse. We divided up all the cuts in the movie (there were 211) into those that corresponded to major scene breaks and those that did not. We looked at brain activity around the time of each cut and compared it to brain activity from the periods in between the cuts.

film cuts work because they exploit the ways in which our visual systems evolved to work in the real world

If we just looked at which areas responded most vigorously to cuts, the big winner was primary visual cortex. This makes sense; at each cut, the picture on the screen – and thus the picture in your head – is changing dramatically. However, when we focused on areas that were active at cuts except those that came at scene changes, we found support for our hypothesis: there was a set of intermediate visual areas that was highly active for cuts within a scene, but less active when the cut initiated a new scene. Our conclusion was that these areas are involved in bridging the visual discontinuities at cuts – and probably just as important for bridging the discontinuities we experience when we blink or move our eyes.

So now I think we have a story about why our heads don’t explode when we watch movies. It’s not that we have learned how to deal with cuts. It’s certainly not that our brains have evolved biologically to deal with film – the timescale is way too short. Instead, film cuts work because they exploit the ways in which our visual systems evolved to work in the real world.

Which is not to say that filmmaking techniques can’t mess up our heads. From time to time in the years since the invention of film editing, filmmakers have gone out of their way to deliberately provoke viewers by disrupting the natural relationships that are present in natural vision. One example is extreme flicker. In the film Man with a Movie Camera (1929), the director Dziga Vertov and his assistant editor Yelizaveta Svilova employed bursts of single-frame shots that are assaultive in part because of their rapid alternation of bright and dark, like a strobe light. This degree of flicker is very unusual in nature and our brains are not well equipped for it – extreme flicker can occasionally induce seizures.

Another example is extreme camera motion. Outside the theatre, there is a tight relationship between motion signals from vision and from the accelerometers in your inner ear. When you turn your head, everything in your visual field moves and you feel a corresponding acceleration in your body. A camera that twirls a lot – think Gravity (2013) – or shakes – think The Blair Witch Project (1999) or Cloverfield (2008) – breaks that relationship, and can induce nausea.

These examples reinforce the story about why most film editing feels natural and unobtrusive. When the editing supports the way our visual systems evolved to function, it tends to recede into the background of our ongoing visual functioning. This suggests that filmmakers must be pretty good intuitive perceptual psychologists. I think that’s true. When a good commercial film manages to make its cuts invisible, it distills 100 years of lore into its editing practice. And the techniques work, don’t they? We rarely notice transitions except when the director is doing something on purpose to mess with us. Which means that filmmakers’ intuitions are a gold mine for people like me who want to know how perception and memory work.