Guilin, China. 1979. Photo by Hiroji Kubota/Magnum


Counting China

By rejecting sampling in favour of exhaustive enumeration, communist China’s dream of total information became a nightmare

by Arunabh Ghosh + BIO

Guilin, China. 1979. Photo by Hiroji Kubota/Magnum

Sometime in the fall of 1955, a Chinese statistical worker by the name of Feng Jixi penned what might well be the most romantic sentence ever written about statistical work. ‘Every time I complete a statistical table,’ Feng wrote:

my happiness is like that of a peasant on his field catching sight of a golden ear of wheat, my excitement like that of a steelworker observing molten steel emerging from a Martin furnace, [and] my elation like that of an artist completing a beautiful painting.

Feng’s clever juxtaposition allowed statistics, that most staid of subjects, to intrude into a much broader and more affective consciousness, one populated by more readily discernible achievements in industry, agriculture and the arts.

This intrusion does not sit easily. After all, most of us don’t care too much for statistics. We might celebrate Olympic medal tallies or share consternation about a decline in GDP, but our engagement remains superficial. It’s only in moments of crisis that we begin to pay attention. Our current obsession with all kinds of data related to COVID-19 is a case in point. But even in such moments, we focus largely on the numbers themselves, wondering about their reliability, politicising them, arguing about their possible manipulation, or making comparisons within and across societies. Implicit in these actions is the assumption that there exists a neutral, untainted truth that these numbers can accurately and unequivocally capture. This is, of course, patently false. Statistics are neutral only if we accept that how we come to know something has no bearing on what we know (and, of course, vice versa).

The 1950s were witness to arguably the most vigorous disagreement over this question of what and how. As the world emerged from the devastation of the Second World War and entered a period of decolonisation and imperial collapse, countries both old and new reposed great faith in the authority of quantitative methods and statistics. Collecting and analysing data using advanced statistical methods came to define modern governance. This shared faith, however, didn’t always translate into shared methods. Refracted through an increasingly thick Cold War lens, the universal desire for ever-increasing quantitative control splintered, taking forms that not only varied significantly but were seen as each other’s correctives.

In October 1949, Mao Zedong and the Communists declared victory over Chiang Kai-shek and his Nationalist government, putting an end to nearly four decades of chaos punctuated by rebellion, warlord-rule, a Japanese invasion and, finally, a bloody civil war. Buoyed by their victory and confident of the transcendence of Marxism-Leninism, the Communists set about transforming China. Few, if any, were ever more optimistic. But the great promises of improvement also helped rationalise repressive and violent measures, leaving behind scars both physical and psychological. Powerful and progressive reforms, such as the redistribution of land to peasants and the enactment of a new, egalitarian marriage law, coexisted with the need to discipline and subdue different sections of society, from bureaucrats to merchants to intellectuals. The final campaign of the decade brought the story full circle. A terrible famine devastated the same peasants that had so spectacularly been empowered 10 years earlier. Across this vivid canvas, the story of statistics, subject to benign neglect at the best of times, ought not to occupy pride of place. And yet, it lies at the heart of China’s socialist experiment in the 1950s.

Ever since their miraculous escape from Nationalist armies in 1934 (mythologised in Party lore as the Long March), the Communists had sought to distinguish themselves as a party and a movement with a difference. From their new base in the dusty hills of north-central China, they began experiments in communist governance, balancing practical and ideological objectives. Recruitment of a peasant army, land reform and cadre training were accompanied by the creation of a distinct Marxism-Leninism-inspired theoretical apparatus. After 1949, as they took control of the mainland, they were in a position to act on their longtime claims to difference. Few domains demonstrated as clean a break from the past as did statistics. In a speech in 1951, Li Fuchun, one of a handful of technocratically minded leaders, summarily dismissed the utility of Nationalist-era statistics, branding them an Anglo-American bourgeois conceit, unsuitable for ‘managing and supervising the country’. New China needed a new kind of statistics, he declared.

A new kind of statistics China did indeed fashion. And it offered emphatic answers to the interwoven and essential questions of what and how. Rejecting the Nationalist (and globally dominant) understanding of statistics as a universal science, Chinese statisticians joined their Soviet compatriots in redefining statistics as a social science. Its true object was the social world. Eschewed outright were the physical and natural worlds. These latter areas became the subject of mathematical statistics, whose correct place was in departments of mathematics, physics, engineering and so on. Symptomatic of these divides is the career of Xu Baolu, one of China’s foremost probabilists. A professor in the mathematics department of Peking University during the 1950s, Xu had no known interaction with the State Statistics Bureau or with the newly established department of statistics at the nearby People’s University. With their sights set and rightful purpose claimed, Chinese statisticians proceeded to interpret Marxism’s explicit teleology as grounds to reject the existence of chance and probability in the social world. In their eyes, there was nothing uncertain about mankind’s march towards socialism and, eventually, communism. What role, then, could probability or randomness play in the study of social affairs?

The implications for statistical methods were profound. In rejecting probability, and the larger area of mathematical statistics within which it belonged, China’s statisticians discarded a large array of techniques, none more critical than the era’s newest and most exciting fact-generating technology – large-scale random sampling. Instead, they decided that the only correct way to ascertain social facts was to count them exhaustively. Only in this way could extensive, complete and objective knowledge be generated. Out of this understanding emerged a strict hierarchy of methods. At the top was complete enumeration, realised through a vast system of comprehensive and periodic reports covering all sectors of the economy. Next came one-time censuses, which were used to collect data on an ad-hoc, as-needed basis. Finally, only in those circumstances when an exhaustive count wasn’t possible, did Chinese statisticians also use non-randomised (ethnographic) sample surveys.

The census was a symbolic appendage, invoked to conjure up China’s enormity but of little use otherwise

The most recognisable example of an exhaustive enumeration is the population census. A crucial tool to help society understand itself, it also serves as a basis for policymaking. The currently underway 2020 United States census is portrayed as a ‘once-in-a-decade chance to shape the future of your family and community’. In the US, the census has been a decennial ritual since 1790; in the United Kingdom, since 1801; and in India, since 1871. China had to wait until 1953 for its first complete population census. When results were formally declared in November of the following year, China’s population stood at 583 million. But only a handful of fields – such as name, age, sex and nationality – had been enumerated and the absence of disaggregated data, which the State Statistics Bureau withheld, rendered its value for research or policymaking negligible.

Instead, the question of whether China had too many people dominated conversation. In September 1949, Mao had declared: ‘We have very favourable conditions: a population of 475 million people.’ By the middle of the decade, his pro-natalist views had softened and he began to speak of birth control and birth planning. In 1957, he enigmatically proclaimed: ‘the fact that China has a large population is good and bad; China’s advantage is that people are many, [its] disadvantage also is that people are many.’ These vacillations rendered the question of population a fraught matter. Just how fraught became clear that same year, when the Party mounted a concerted campaign of vilification against the president of Peking University, Ma Yinchu. By October 1959, academic colleagues and Party functionaries authored 200 articles castigating Ma for wrongly predicting a Malthusian crisis and advocating birth control measures. Dismissed from the university presidency in 1960, Ma would have to wait until 1979 for a formal apology from the Party.

The census was reduced to a symbolic appendage, invoked to conjure up China’s enormity but of little use otherwise. Instead, China’s statisticians and planners deemed a different set of numbers more important. These numbers, the poet and author Ba Jin gushed:

gather the sentiments of 600 million people, they also embody their common aspirations and are their signpost [pointing to the future]. With them, correctly, step by step, we shall arrive on the road to [building a] socialist society. They are like a bright lamp, illuminating the hearts of 600 million.

The numbers about which Ba waxed poetic were those related to planning and economic management.

In standard Chinese chronologies, 1949-52 is described as the period of economic recovery, and 1953-57 are the years of the first five-year plan. The plan itself consisted of a plethora of constituent parts, broken into annual, semi-annual and quarterly timeframes, which planners applied to the entire socioeconomic landscape. In its idealised form, the state would collect data across all spheres of society. The headquarters of the State Statistics Bureau, established in Beijing in 1952, was divided into 13 branches, dealing with industry, agriculture, capital construction, trade, the distribution of supplies, transportation, labour wages, culture, education and publication. Each of these branches received data that had made their way up progressively from villages to counties to provincial bureaus. In Beijing, the data were further collated and compiled, and then sent to the State Planning Commission. These plans were then sent back down the same route to provincial, municipal and county planning offices, and through them, to the corresponding statistics offices. At its mid-decade peak, this system of statistical data collection employed 200,000 full-time cadres spread across 2,200 counties and 750,000 villages.

The supremacy of planning was accompanied by a particular emphasis on material production and a corresponding neglect of service-based activities, such as administration, retail and accounting. It also generated a hierarchical relationship between the two most important sectors of the economy. Although the Communist Revolution was won by peasant armies, economic policy as a whole placed primary emphasis on the rapid growth of heavy industries. Agriculture, which constituted nearly 50 per cent of the economy in 1949, was relegated to the background, its primary task to generate surplus for investment in heavy industries.

All too often provincial bureaus or headquarters in Beijing encountered disparate numbers for a given product

The scale of the statistical operation, the privileging of material production, and within that the emphasis on heavy industry and relative neglect of agriculture, all contributed to ensure that the dream of total information, so alluring as an ideal, was a nightmare in practice. Every level of the statistical system contributed to the overproduction of data. In a system that valued the production of material goods above all else, the only way a white-collar service such as statistics could draw attention to itself was by claiming, as Feng did, that statistical tables were a material contribution to the economy, just like wheat and steel. With the production of tables so incentivised, the entire system responded with gusto to produce them. Soon, there were so many reports circulating that it was impossible to keep track of them. Internal memoranda bemoaned the chaos, but it was a pithy four-character phrase that truly captured the exasperation. Translated, it reads: ‘Useless at the moment of creation!’

Compounding the problem of overproduction was the issue of data irreconcilability. All too often provincial bureaus or headquarters in Beijing encountered disparate numbers for a given product. The State Statistics Bureau issued catalogues of industrial products in the hopes of achieving some standardisation, but to little avail. Units of measurement presented another source of confusion, with regional differences in weights, units of volume, and groupings (think dozens versus tens) making aggregate estimations frequently incommensurable. Overproduction and irreconcilability of statistical data, in turn, fuelled chronic delays.

Nowhere were these problems as pronounced as in the agricultural sector. Already neglected under the prevailing industry-centric orientation, the scale of the agricultural sector and the large variation in terrain, crops and seasons compounded the tendencies to overreport, delay and generate incommensurable data. The waves of collectivisation that began in 1952 and forced villagers into larger and larger collectives also exacerbated the problem of reliable and standardised data.

As the 1950s wore on, two other trends also shaped statistical work. The first was the increasingly complex nature of the economy. The state set up new factories and industries, nationalised older private ones, and continued to reorganise agriculture. A second factor was a growing recognition among some of the statistical and planning leadership that the State Statistics Bureau ought to provide not just data but conduct analyses as well.

Out of these circumstances emerged data’s own uncertainty principle: accuracy and timeliness were in conflict; prioritising precision in one typically meant compromising precision in the other. If numbers weren’t provided at the right time, decision-making suffered. But what good would those decisions be if the numbers themselves were of poor quality? The paradox was debilitating. In September 1957, the head of the State Statistics Bureau Xue Muqiao broke the deadlock by declaring that:

In order for the leading authorities to understand the situation, research questions, and decide on policies, they frequently need reference data on a timelier basis. Such data need not possess a high degree of accuracy or be comprehensive, but it must be supplied in a timely fashion.

Xue’s decision released the cat of estimation among the pigeons of accuracy. Higher levels of the statistical system set ever stricter deadlines and the lower levels responded with more and more estimated numbers. As these numbers travelled up the chain, from county to province to Beijing, they were combined with other estimates, leaving provincial and, eventually, national data with ever larger margins of error.

Random sampling offered a cheaper, more accurate and faster way to collect grassroots data

By 1957, Chinese statisticians were well aware that they had built a system that generated copious quantities of facts but left them poorly informed. For instance, although the State Statistics Bureau had detailed granular data for grain and cotton yields, once aggregated to provincial and national levels, these data were often found to be so inconsistent across years that statisticians and planners were unable to assess plan completion or the efficacy of specific policy measures. Desperate for change, Xue and one of his deputies, Wang Sihua, began to consider options what had until then been anathema. In one of the earliest instances of scientific exchange among countries of the Global South, they reached out to statisticians at the Indian Statistical Institute in Calcutta. The director of the institute, a physicist by the name of P C Mahalanobis, had pioneered the use of large-scale random sampling across India. As the Indian experience made clear, random sampling offered a cheaper, more accurate and faster way to collect grassroots data. In the summer of 1957, Mahalanobis spent three weeks in Beijing. Chinese statisticians, including Wang, visited Calcutta in 1957 and 1958. Back in Beijing, Xue and Wang led efforts to prepare the grounds for wider adoption of the large-scale random sampling method, hoping especially to employ it in the agricultural sector.

The question of what would have happened had China adopted large-scale random sampling now occupies the murky realm of counterfactual history. In 1958, Mao and the Party leadership initiated a new campaign, declaring that China would surpass Britain’s industrial production in 15 years. Known as the Great Leap Forward, it entailed a fundamental reorganisation of labour and production technologies. In the countryside, villages and cooperatives went through a final round of collectivisation, creating massive communes with tens of thousands of people. These people worked on nationalised farms, at backyard blast furnaces and on myriad infrastructural projects such as dams and canals. Their urban counterparts experienced similar communal and labour-intensive practices. In the world of data collection, the Great Leap Forward marked a turn away from exhaustive enumeration and the adoption, instead, of decentralised and ethnographic methods. A tract from 1927 on rural investigation, authored by Mao, became the new methodological model for data collection. True knowledge could be gained only by a detailed, in-person investigation, not through vast exhaustive surveys nor through randomised sampling. The shift left the statistical apparatus with no reliable means to check its own data. Most tellingly, it contributed to the state’s reduced capacity to ascertain accurately the devastating famine that overtook the countryside starting in 1959. Estimates vary, but most scholars agree that at least 30 million people, and possibly many more, lost their lives by the time the Great Leap Forward ended in 1962.

It would take more than a decade, until after the death of Mao’s designated successor Lin Biao in 1971, to restore statistical work in China. And another decade still before the post-1978 policies of reform and opening up created the grounds for a thorough reappraisal of statistics and for the slow reintegration of probabilistic methods.

From the vantage point of today, the travails of China’s statisticians during the 1950s might appear quaint, their obsession with definitional issues and their rejection of probabilistic methods an artifact of a more ideologically driven time. That would be a mistake. The concerns that drove them are with us today, as alive and as urgent as they were 70 years ago. At their heart is a set of basic and timeless questions: what do we need to know and how should we know it? Their answers gave them confidence to value exhaustive enumeration above all else. This confidence has echoes in today’s Big Data revolution, which similarly insists that the more information we quantify, the better shall our knowledge be, and the more appropriate our solutions.

There are other lessons. Today, as in the 1950s, randomised sampling and in-depth case studies remain valuable but are increasingly neglected. Instead of ignoring them, we need to recognise that each method – the randomised, the ethnographic and the exhaustive – offers unique insights. And although none is a panacea, together they constitute a far more supple toolkit, expanding both what we can know and how we can know.

The COVID-19 pandemic has forced the world to confront the uncertainty principle afresh. Much as mid-20th-century Chinese statisticians discovered, the timely delivery of data and the guarantee of their accuracy sit in some tension with one another. To achieve precision in both remains as great a challenge today. The choices that Chinese officials made then weren’t always easy or self-evident ones; neither are those that are being made today. And they affect two other values that we ought to cherish: transparency and commensurability. Timeliness and accuracy are of little use if we don’t make the data freely available and if we don’t use commonly agreed standards. A lack of both transparency and commensurability hobbled statistical work in 1950s China. They remain as intractable today, generating confusion that, in its most vicious form, can sow deep distrust between researchers, institutions and communities.

As we continue to confront biased and manipulated data in our daily lives, the example of 1950s China reminds us of the importance of separating outcomes that can be traced to first principles (‘statistics is a social science’) from those that are a result of post-hoc manipulation (‘this estimate is too low, let’s report a higher one’). In a world increasingly divided by narrow nationalist visions, recognising that all data are biased, but that not all biases are the same, might well be a matter of life and death.