Winnie the Pooh by Banksy, displayed at Bonhams’ inaugural US auction of urban art on 24 October 2012 in Los Angeles, California. Photo by Tibrina Hobson/WireImage/Getty


Calculating art

Artistic success takes a mysterious mix of talent, luck and timing. But could algorithms now predict and produce the hits?

by Hannah Fry + BIO

Winnie the Pooh by Banksy, displayed at Bonhams’ inaugural US auction of urban art on 24 October 2012 in Los Angeles, California. Photo by Tibrina Hobson/WireImage/Getty

Justin was in a reflective mood. On 4 February 2018, in the living room of his home in Memphis, Tennessee, he sat watching the Super Bowl, eating M&Ms. Earlier that week, he’d celebrated his 37th birthday, and now – as had become an annual tradition – he was brooding over what his life had become.

He knew he should be grateful, really. He had a perfectly comfortable life. A stable nine-to-five office job, a roof over his head and a family who loved him. But he’d always wanted something more. Growing up, he’d always believed he was destined for fame and fortune.

So how had he ended up being so… normal? ‘It was that boyband,’ he thought to himself. The one he’d joined at 14. ‘If we’d been a hit, everything would have been different.’ But, for whatever reason, the band was a flop. Success had never quite happened for poor old Justin Timberlake.

Despondent, he opened another beer and imagined what might have been. On the screen, the Super Bowl commercials came to an end. Music started up for the big half-time show. And in a parallel universe – virtually identical to this one in all but one detail – another 37-year-old Justin Timberlake from Memphis took the stage.

Why is the real Justin Timberlake so successful? And why did the other Justin Timberlake fail? Some people might argue that pop-star Justin’s success is deserved: his natural talent, his good looks, his dancing abilities and the artistic merit of his music made fame inevitable. But others might think that the stars are just the ones who got lucky.

There’s no way to know without building a series of identical parallel worlds, releasing Timberlake into each, and watching all the incarnations evolve. This was the idea behind an experiment conducted by the sociologists Matthew Salganik, Peter Dodds and Duncan Watts in 2006 that created a series of digital worlds. The researchers built their own online music player, like a very crude version of Spotify, and filtered off visitors into a series of eight parallel musical websites, each identically seeded with the same 48 songs by undiscovered artists.

In what became known as the Music Lab, a total of 14,341 music fans were invited to log on to the player, listen to clips of each track, rate the songs, and download the music they liked best. Just as on the real Spotify, visitors could see at a glance what music other people in their ‘world’ were listening to. Alongside the artist name and song title, participants saw a running total of how many times the track had already been downloaded within their world. All the counters started off at zero, and over time, as the numbers changed, the most popular songs in each of the eight parallel charts gradually became clear.

Meanwhile, to get some natural measure of the true popularity of the records, the team also built a control world, where visitors’ choices couldn’t be influenced by others. There, the songs would appear in a random order on the page – either in a grid or in a list – but the download statistics were shielded from view.

The results were intriguing. All the worlds agreed that some songs were clear duds. Other songs were stand-out winners: they ended up being popular in every world, even the one where visitors couldn’t see the number of downloads. But in between sure-fire hits and absolute bombs, the artists could experience pretty much any level of success.

Take 52metro, a punk band from Milwaukee, whose song Lockdown was wildly popular in one world, where it finished up at the very top of the chart, and yet completely bombed in another world, ranking 40th out of 48 tracks. Exactly the same song, up against exactly the same list of other songs; it was just that, in this particular world, 52metro never caught on. Success, sometimes, was a matter of luck. Although the path to the top wasn’t set in stone, the researchers found that visitors were much more likely to download tracks they knew were liked by others. If a middling song got to the top of the charts early on by chance, its popularity could snowball. More downloads led to more downloads. Perceived popularity became real popularity, so that eventual success was just randomness magnified over time.

There was a reason for these results. It’s a phenomenon known to psychologists as social proof. Whenever we haven’t got enough information to make decisions for ourselves, we have a habit of copying the behaviour of those around us. The more platforms we use to see what’s popular – bestseller lists, Amazon rankings, Rotten Tomatoes scores, Spotify charts – the bigger the impact that social proof will have. The effect is amplified further when there are millions of options being hurled at us, plus marketing, celebrity, media hype and critical acclaim all demanding your attention.

All this means that sometimes terrible music can make it to the top. That’s not just me being cynical. It seems that the music industry itself is fully aware of this fact. During the 1990s, supposedly the result of a wager to see who could get the worst song possible into the charts, an English girl group called Vanilla emerged. Thanks to a few magazine features and an appearance on the BBC TV show Top of the Pops, the song managed to get to number 14 in the charts.

Vanilla’s success was short-lived. By their second single, their popularity was already waning. They never released a third, suggesting that social proof isn’t the only factor at play – as indeed a follow-up experiment from the Music Lab team showed. The set-up to their second study was largely the same as the first. But this time, to test how far the perception of popularity became a self-fulfilling prophecy, the researchers added a twist. Once the charts had had the chance to stabilise in each world, they paused the experiment and flipped the billboard upside down. New visitors to the music player saw the chart-topper listed at the bottom, while the flops at the bottom took on the appearance of the crème de la crème at the top.

Almost immediately, the total number of downloads by visitors dropped. Once the songs at the top weren’t appealing, people lost interest in the music on the website overall. Meanwhile, the good tracks languishing at the bottom did worse than when they were at the top, but still better than those that had previously been at the end of the list. If the scientists had let the experiment run on long enough, the very best songs would have recovered their popularity. Conclusion: both luck and quality play a role.

We’re put off by the banal, but also hate the radically unfamiliar

Modern algorithms also offer us insight into the popularity of films. Studies have been especially trenchant because you can measure all sorts of factors through information collected by websites such as the Internet Movie Database (IMDb) or Rotten Tomatoes. Take the study conducted by the data scientist Sameet Sreenivasan in 2013, based on keywords and descriptors in IMDb. On their own, the keywords showed that our interest in certain plot elements come in bursts; think of Second World War films, or movies that tackle the subject of abortion. There’ll be a spate of releases on a similar topic in quick succession, and then a lull for a while. When considered together, the tags allowed Sreenivasan to come up with a score for the novelty of each film at the time of its release – a number between zero and one.

As it turns out, we have a complicated relationship with novelty. On average, the higher the novelty score a film had, the better it did at the box office. But only up to a point. Push past that novelty threshold, and there’s a precipice; the revenue earned by a film fell off a cliff. Sreenivasan’s study showed what social scientists had long suspected: we’re put off by the banal, but also hate the radically unfamiliar.

The novelty score might be a useful way to help studios avoid backing absolute stinkers, but it’s not much help if you want to know the fate of an individual film. For that, the work of a European team of researchers conducted in 2006 might be more useful. The authors classified movies into one of nine categories, ranging from total flop to box-office smash hit. The neural network outperformed statistical techniques tried before, but still managed to classify the performance of a movie correctly only 36.9 per cent of the time, on average. It was a little better in the top category – those earning more than $200 million – correctly identifying those real blockbusters 47.3 per cent of the time. But investors beware. Around 10 per cent of the films picked out by the algorithm as destined to be hits went on to earn less than $20 million – which by Hollywood’s standards is a pitiful amount.

In short, evidence points in a single direction; until you have data on the early audience reaction, popularity is largely unpredictable.

Predicting popularity is tricky, but what about judging the aesthetic value of art? Some philosophers – such as Gottfried Leibniz – argue that if there are objects that we can all agree on as beautiful, say Michelangelo’s David or Mozart’s Lacrimosa, then there should be some definable, measurable, essence of beauty that makes one piece of art objectively better than another.

Philosophers such as David Hume, meanwhile, argue that beauty is in the eye of the beholder. Consider the work of Andy Warhol, which offers a powerful aesthetic experience to some, while others find it artistically indistinguishable from a tin of soup.

Others still, Immanuel Kant among them, have said the truth is something in between – that our judgments of beauty are not wholly subjective, nor can they be entirely objective. They are sensory, emotional and intellectual all at once – and, crucially, can change over time depending on the state of mind of the observer.

There is certainly some evidence to support this idea. Fans of the English artist Banksy might remember how in 2013 he set up a stall in Central Park in New York, anonymously selling original black-and-white spray-painted canvases for $60 each. The stall was tucked away in a row of others selling the usual touristy stuff, so the price tag must have seemed expensive to those passing by. It was several hours before someone decided to buy one. In total, the day’s takings were $420.14. But a year later, in an auction house in London, another buyer would deem the aesthetic value of the very same artwork great enough to spend £68,500 (at the time, around $115,000) on a single canvas.

The recommendation algorithms merely offer songs and films good enough to insure you against disappointment

In another, similar instance, The Washington Post in 2007 asked the internationally renowned violinist Joshua Bell to add an extra concert to his schedule of sold-out symphony halls. Armed with his $3.5 million Stradivarius violin, Bell pitched up at the top of an escalator in a metro station in Washington, DC during morning rush hour, put a hat on the ground to collect donations, and performed for 43 minutes. The result? Seven people stopped to listen for a while. More than 1,000 walked straight past. By the end of his performance, Bell had collected $32.17 in his hat. The point is this: even if there are some objective criteria that make one artwork better than another, as long as context plays a role in our aesthetic appreciation of art, it’s not possible to create a tangible measure for aesthetic quality that works for all places in all times.

But an algorithm needs something to go on. So, once you take away popularity and inherent quality, you’re left with the only thing that can be quantified: a metric for similarity to whatever has gone before. When it comes to building a recommendation engine, Netflix and Spotify have the algorithm down. What do users listen to, what do they watch, what do they return to time and time again? At no point is Spotify or Netflix trying to deliver the perfect song or film. They have little interest in perfection. Spotify doesn’t promise to hunt out the one band on Earth that is destined to align wholly and flawlessly with your taste and mood. The recommendation algorithms merely offer you songs and films that are good enough to insure you against disappointment.

Similarity works perfectly well for recommendation engines. But when you ask algorithms to create art without a pure measure for quality, that’s where things start to get interesting. Can an algorithm be creative if its only sense of art is what happened in the past?

In October 1997, an audience arrived at the University of Oregon to hear the pianist Winifred Kerner play three short pieces. One was a lesser-known keyboard composition by the master of the Baroque, Johann Sebastian Bach. A second was composed in the style of Bach by Steve Larson, a professor of music at the university. And a third was composed by an algorithm, deliberately designed to imitate the style of Bach. After hearing the three performances, the audience was asked to guess which was which. To Larson’s dismay, the majority voted that his was the piece that had been composed by the computer. And to collective gasps of delighted horror, the audience learned that the music they’d voted as genuine Bach was nothing more than the work of a machine.

David Cope, the man who created the remarkable algorithm behind the computer composition, had seen an audience duped before. ‘I [first] played what I called the “game” with individuals,’ he told me. ‘And when they got it wrong they got angry … Because creativity is considered a human endeavour.’

This had certainly been the opinion of Douglas Hofstadter, the cognitive scientist and author who had organised the concert in the first place. Years earlier, in his Pulitzer prizewinning book Gödel, Escher, Bach (1979), Hofstadter had taken a firm stance on the matter: ‘Music is a language of emotions, and until programs have emotions as complex as ours, there is no way a program will write anything beautiful.’

But after hearing the output of Cope’s algorithm – the so-called ‘experiments in musical intelligence’ (EMI) – Hofstadter conceded that perhaps things weren’t quite so straightforward: ‘I find myself baffled and troubled by EMI,’ he confessed in the days following the University of Oregon experiment. ‘The only comfort I could take at this point comes from realising that EMI doesn’t generate style on its own. It depends on mimicking prior composers. But that is still not all that much comfort. To my absolute devastation [perhaps] music is much less than I ever thought it was.’

So which is it? Is aesthetic excellence the sole preserve of human endeavour? Or can an algorithm create art? And if an audience couldn’t distinguish EMI from music by a great master, had this machine demonstrated the capacity for true creativity?

Let’s try to tackle those questions in turn, starting with the last one. To form an educated opinion, it’s worth pausing briefly to understand how the algorithm works, something Cope was generous enough to explain to me.

The first step in building the algorithm was to translate Bach’s music into something that can be understood by a machine: ‘You have to place into a database five representations of a single note: the on time, the duration, pitch, loudness and instrument.’ For each note in Bach’s back catalogue, Cope had to painstakingly enter these five numbers into a computer by hand. There were 371 Bach chorales alone, many harmonies, tens of thousands of notes, five numbers per note. It required a monumental effort from Cope: ‘For months, all I was doing every day was typing in numbers. But I’m a person who is nothing but obsessive.’

From there, Cope’s analysis took each beat in Bach’s music and examined what happened next. For every note that is played in a Bach chorale, Cope made a record of the next note. He stored everything together in a kind of dictionary – a bank in which the algorithm could look up a single chord and find an exhaustive list of all the different places Bach’s quill had sent the music next.

In that sense, EMI has some similarities to the predictive text algorithms you’ll find on your smartphone. Based on the sentences you’ve written in the past, the phone keeps a dictionary of the words you’re likely to want to type next, and brings them up as suggestions while you’re writing.

The algorithms are undoubtedly great imitators, just not very good innovators

The final step was to let the machine loose. Cope would seed the system with an initial chord and instruct the algorithm to look it up in the dictionary to decide what to play next by selecting the new chord at random from the list. Then the algorithm repeats the process – looking up each subsequent chord in the dictionary to choose the next notes to play. The result was an entirely original composition that sounds just like Bach himself.

Or maybe it is Bach himself. That’s Cope’s view, anyway: ‘Bach created all of the chords. It’s like taking Parmesan cheese and putting it through the grater, and then trying to put it back together again. It would still turn out to be Parmesan cheese.’

Regardless of who deserves the ultimate credit, there’s one thing that is in no doubt. However beautiful EMI might sound, it is based on a pure recombination of existing work. It’s mimicking the patterns found in Bach’s music, rather than actually composing any music itself.

More recently, other algorithms have been created that make aesthetically pleasing music that is a step on from pure recombination. One particularly successful approach has been genetic algorithms – another type of machine learning that tries to exploit the way natural selection works. After all, if peacocks are anything to go by, evolution knows a thing or two about creating beauty.

The idea is simple. Within these algorithms, notes are treated like the DNA of music. It all starts with an initial population of ‘songs’– each a random jumble of notes stitched together. Over many generations, the algorithm breeds from the songs, finding and rewarding ‘beautiful’ features within the music to breed ‘better’ and ‘better’ compositions as time goes on. I say ‘beautiful’ and ‘better’, but – of course – as we already know, there’s no way to decide what either of those words mean definitively. The algorithm can create poems and paintings as well as music, but – still – all it has to go on is a measure of similarity to whatever has gone before.

And sometimes that’s all you need. If you’re looking for a background track for your website or your YouTube video that sounds generically like a folk song, you don’t care that it’s similar to all the best folk songs of the past. Really, you just want something that avoids copyright infringement without the hassle of having to compose it yourself. And if that’s what you’re after, there are a number of companies that can help. British startups Jukedeck and AI Music are already offering this kind of service, using algorithms that are capable of creating music. Some of that music will be useful. Some of it will be (sort of) original. Some of it will be beautiful, even. The algorithms are undoubtedly great imitators, just not very good innovators.

That’s not to do these algorithms a disservice. Most human-made music isn’t particularly innovative either. If you ask Armand Leroi, the evolutionary biologist who studied the cultural evolution of pop music, we’re a bit too misty-eyed about the inventive capacities of humans. Even the stand-out successes in the charts, he says, could be generated by a machine. Here’s his take on Pharrell Williams’s song Happy, for example (something tells me he’s not a fan):

‘Happy, happy, happy, I’m so happy.’ I mean, really! It’s got about, like, five words in the lyrics. It’s about as robotic a song as you could possibly get, which panders to just the most base uplifting human desire for uplifting summer happy music. The most moronic and reductive song passable. And if that’s the level – well, it’s not too hard.

Leroi doesn’t think much of the lyrical prowess of Adele either: ‘If you were to analyse any of the songs, you would find no sentiment in there that couldn’t be created by a sad-song generator.’

You might not agree (I’m not sure that I do), but there is certainly an argument that much of human creativity – like the products of the ‘composing’ algorithms – is just a novel combination of pre-existing ideas.

I’m a mathematician. I can trade in facts about false positives and absolute truths about accuracy and statistics with complete confidence. But in the artistic sphere, I’d prefer to defer to Leo Tolstoy. Like him, I think that true art is about human connection; about communicating emotion. As he put it: ‘Art is not a handicraft, it is the transmission of feeling the artist has experienced.’ If you agree with Tolstoy’s argument, then there’s a reason why machines can’t produce true art. A reason expressed beautifully by Hofstadter in Gödel, Escher, Bach, years before he encountered EMI:

A ‘program’ which could produce music … would have to wander around the world on its own, fighting its way through the maze of life and feeling every moment of it. It would have to understand the joy and loneliness of a chilly night wind, the longing for a cherished hand, the inaccessibility of a distant town, the heartbreak and regeneration after a human death. It would have to have known resignation and world-weariness, grief and despair, determination and victory, piety and awe. In it would have had to commingle such opposites as hope and fear, anguish and jubilation, serenity and suspense. Part and parcel of it would have to be a sense of grace, humour, rhythm, a sense of the unexpected – and of course an exquisite awareness of the magic of fresh creation. Therein, and therein only, lie the sources of meaning in music.

I might well be wrong here. Perhaps if algorithmic art takes on the appearance of being a genuine human creation – as EMI did – we’ll still value it, and bring our own meaning to it. After all, the long history of manufactured pop music seems to hint that humans can form an emotional reaction to something that has no more than the semblance of an authentic connection. And perhaps once these algorithmic artworks become more commonplace and we become aware that the art didn’t come from a human, we won’t be bothered by the one-way connection. After all, people form emotional relationships with objects that don’t love them back – like treasured childhood teddy bears or pet spiders.

But for me, true art can’t be created by accident. There are boundaries to the reach of algorithms. Limits to what can be quantified. Among all of the staggeringly impressive, mindboggling things that data and statistics can tell me, how it feels to be human isn’t one of them.

Excerpted from Hello World: Being Human in the Age of Algorithms by Hannah Fry. © 2018 by Hannah Fry Limited. Used with permission of the publisher, W W Norton & Company, Inc. All rights reserved.