Photo by Darren Hauck/Getty


Science is broken

Perverse incentives and the misuse of quantitative metrics have undermined the integrity of scientific research

by Siddhartha Roy & Marc A Edwards + BIO

Photo by Darren Hauck/Getty

The rise of the 20th-century research university in the United States stands as one of the great achievements of human civilisation – it helped to establish science as a public good, and advanced the human condition through training, discovery and innovation. But if the practice of science should ever undermine the trust and symbiotic relationship with society that allowed both to flourish, our ability to solve critical problems facing humankind and civilisation itself will be at risk. We recently explored how increasingly perverse incentives and the academic business model might be adversely affecting scientific practices, and by extension, whether a loss of support for science in some segments of society might be more attributable to what science is doing to itself, as opposed what others are doing to science.

We argue that over the past half-century, the incentives and reward structure of science have changed, creating a hypercompetition among academic researchers. Part-time and adjunct faculty now make up 76 per cent of the academic labour force, allowing universities to operate more like businesses, making tenure-track positions much more rare and desirable. Increased reliance on emerging quantitative performance metrics that value numbers of papers, citations and research dollars raised has decreased the emphasis on socially relevant outcomes and quality. There is also concern that these pressures could encourage unethical conduct by scientists and the next generation of STEM scholars who persist in this hypercompetitive environment. We believe that reform is needed to bring balance back to the academy and to the social contract between science and society, to ensure the future role of science as a public good.

The pursuit of tenure traditionally influences almost all decisions, priorities and activities of young faculty at research universities. Recent changes in academia, however, including increased emphasis on quantitative performance metrics, harsh competition for static or reduced federal funding, and implementation of private business models at public and private universities are producing undesirable outcomes and unintended consequences (see Table 1 below).

Quantitative metrics are increasingly dominating decision-making in faculty hiring, promotion and tenure, awards and funding, and creating an intense focus on publication count, citations, combined citation-publication counts (h-index being the most popular), journal impact factors, total research dollars and total patents. All these measures are subject to manipulation as per Goodhart’s law, which states: When a measure becomes a target, it ceases to be a good measure. The quantitative metrics can therefore be misleading and ultimately counterproductive to assessing scientific research.

Table 1: Modified and with quotes from the blog Embedded in Academia by John Regehr, professor of computer science at the University of Utah; used with permission.

The increased reliance on quantitative metrics might create inequities and outcomes worse than the systems they replaced. Specifically, if rewards are disproportionally given to individuals manipulating the metrics, well-known problems of the old subjective paradigms (eg, old-boys’ networks) appear simple and solvable. Most scientists think that the damage owing to metrics is already apparent. In fact, 71 per cent of researchers believe that it is possible to ‘game’ or ‘cheat’ their way into better evaluations at their institutions.

This manipulation of the evaluative metrics has been documented. Recent exposés have revealed schemes by journals to manipulate impact factors, use of p-hacking by researchers to mine for statistically significant and publishable results, rigging of the peer-review process itself and over-citation practices. The computer scientist Cyril Labbé at the Joseph Fourier University in Grenoble even created Ike Antkare, a fictional character, who, by virtue of publishing 102 computer-generated fake papers, achieved a stellar h-index of 94 on Google Scholar, surpassing that of Albert Einstein. Blogs describing how to inflate your h-index without committing outright fraud are, in fact, just a Google search away.

Since the Second World War, scientific output as measured by cited work has doubled every nine years. How much of the growth in this knowledge industry is, in essence, illusory and a natural consequence of Goodhart’s law? It is a real question.

Consider the role of quality versus quantity maximising true scientific progress. If a process is overcommitted to quality over quantity, accepted practices might require triple- or quadruple-blinded studies, mandatory replication of results by independent parties, and peer review of all data and statistics before publication. Such a system would produce very few results due to over-caution, and would waste scarce research funding. At another extreme, an overemphasis on quantity would produce numerous substandard papers with lax experimental design, little or no replication, scant quality control and substandard peer-review (see Figure 1 below). As measured by the quantitative metrics, apparent scientific progress would explode, but too many results would be erroneous, and consumers of research would be mired in wondering what was valid or invalid. Such a system merely creates an illusion of scientific progress. Obviously, a balance between quantity and quality is desirable.

It is hypothetically possible that in an environment without quantitative metrics and fewer perverse incentives emphasising quantity over quality, practices of scholarly evaluation (enforced by peer review) would evolve to be near to an optimum level of productivity. But we suspect that the existing perverse-incentive environment is pushing researchers to overemphasise quantity in order to compete, leaving true scientific productivity at less than optimal levels. If the hypercompetitive environment also increased the likelihood and frequency of unethical behaviour, the entire scientific enterprise would be eventually cast into doubt. While there is virtually no research exploring the precise impact of perverse incentives on scientific productivity, most in the academic world would acknowledge a shift towards quantity in research.

Figure 1: Quantity versus quality, vis-à-vis true scientific progress

Favouring output over outcomes, or quantity over quality, can also create a ‘perversion of natural selection’. Such a system is more likely to weed out ethical and altruistic researchers, while selecting for those who better respond to perverse incentives. The average scholar can be pressured to engage in unethical practices in order to have or maintain a career. Then, as per Mark Granovetter’s ‘Threshold Models of Collective Behaviour’ (1978), unethical actions become ‘embedded in the structures and processes’ of a professional culture. At this point, the conditioning to ‘view corruption as permissible’ or even necessary is very strong. Compelling anecdotal testimony, in which accomplished and public-minded professors write about why they are leaving a career they once loved, is emerging. The Chronicle of Higher Education has even coined a name for this genre: Quit Lit. In Quit Lit, even senior researchers provide perfectly rational explanations for leaving their privileged and prized positions, rather than compromise their principles in a hypercompetitive, perverse-incentive environment. One is left to wonder whether minority students or women rationally and disproportionately decide to opt out of the system more so than the groups who tend to persist.

The aim is to slow the ‘avalanche’ of unreliable performance metrics dominating research assessment

In brief, although quantitative metrics provide a superficially attractive approach to evaluating research productivity in comparison with subjective measures, once they are a target they cease to be useful and can even be counter-productive. Continued overemphasis of quantitative metrics might compel all but the most ethical scientists to produce more work of lower quality, to ‘cut corners’ whenever possible, decrease true productivity, and select for scientists who persist and thrive in a perverse-incentive environment. It is hypothetically possible that the realities of modern academia affect the persistence of women and minorities at all phases of the academic pipeline.

Many scientific societies, research institutions, academic journals and individuals have advanced arguments trying to correct some excesses of quantitative metrics. Some have signed the San Francisco Declaration on Research Assessment (DORA). DORA recognises the need for improving ‘ways in which output of scientific research are evaluated’, and calls for challenging research-assessment practices, especially the currently operative ‘journal impact factor’ parameters. As of 1 August this year, 871 organisations and 12,788 individuals have signed DORA, including the American Society for Cell Biology, the American Association for the Advancement of Science, the Howard Hughes Medical Institute, and the Proceedings of the National Academy of Sciences. The publishers of Nature, Science and other journals have called for downplaying the impact-factor metric. The American Society of Microbiology recently took a principled stand and eliminated impact-factor information from all their journals ‘to avoid contributing further to the inappropriate focus on journal [impact factors]’. The aim is to slow the ‘avalanche’ of unreliable performance metrics dominating research assessment. Like others, we are not advocating for the abandonment of metrics, but reducing their importance in decision-making by institutions and funding agencies, until we possibly have objective measures that better represent the true value of scientific research.

In the hypercompetitive funding environment of modern science, the federal government has been the one enabling, indispensable resource. It has been paramount in financing research and development (R&D), creating new knowledge, fulfilling public missions including national security, agriculture, infrastructure and environmental health. Starting in the Second World War, the federal government has largely borne a big fraction of the cost of high-risk, long-term scientific research. Such scientific research carries uncertain prospects or sometimes lacks obvious short-term societal impacts, and follows an agenda that is often set by scientists and funding agencies. This foundation of federal funding has created a research and knowledge ecosystem supplemented by universities and industries. Together, it has made historic contributions to the collective progress of humanity.

For at least the past decade, however, US federal spending on R&D has been in decline. Its ‘research intensity’ (or, the federal R&D budget as a share of the country’s gross domestic product) declining to 0.78 per cent (2014) from about 2 per cent in the 1960s. In tandem, China is projected to outspend the US on R&D by 2020.

US colleges and universities have also historically served to shape the next generation of researchers, who will provide education and knowledge for and to the public. But as universities morph into ‘profit centres’ focused on generating new products and patents, they are de-emphasising science as a public good.

Competition among researchers for funding has never been more intense, entering an era with the worst funding environment in half a century. Between 1997 and 2014, the funding rate for the US National Institutes for Health (NIH) grants fell from 30.5 per cent to 18 per cent. US National Science Foundation (NSF) funding rates have remained stagnant at 23-25 per cent in the past decade. Thankful for small favours, these funding rates are still well above 6 per cent, which is an approximate breakeven point when the net cost of proposal-writing equals the net value obtained from a grant by the grant-winner. Nonetheless, the grant environment is hypercompetitive, susceptible to reviewer biases, skewed towards funding agencies’ research agendas, and strongly dependent on prior success as measured by quantitative metrics. Even before the financial crisis struck, the Nobel laureate Roger Kornberg remarked: ‘If the work you propose to do isn’t virtually certain of success, then it won’t be funded.’ These broad changes take valuable time and resources away from scientific discovery and translation, compelling researchers to spend inordinate amounts of time constantly chasing grant proposals and filling out ever increasing paperwork for grant compliance.

One in 50 scientists admit to misconduct (fabrication, falsification, and/or modifying data) at least once

The steady growth of perverse incentives, and their instrumental role in faculty research, hiring and promotion practices, amounts to a systemic dysfunction endangering scientific integrity. There is growing evidence that today’s research publications too frequently suffer from lack of replicability, rely on biased data-sets, apply low or sub-standard statistical methods, fail to guard against researcher biases, and overhype their findings. In other words, an overemphasis on quantity versus quality. It is therefore not surprising that scrutiny has revealed a troubling level of unethical activity, outright faking of peer review, and retractions. The Economist recently highlighted the prevalence of shoddy and non-reproducible modern scientific research and its high financial cost to society. They strongly suggested that modern science is untrustworthy and in need of reform. Given the high cost of exposing, disclosing or acknowledging scientific misconduct, we can be fairly certain that there is much more than has been revealed. Warnings of systemic problems go back to at least 1991, when the NSF director Walter E Massey noted that the size, complexity and increased interdisciplinary nature of research in the face of growing competition was making science and engineering ‘more vulnerable to falsehoods’.

The NSF defines research misconduct as intentional ‘fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results’. Among research misconduct cases investigated by the US Department of Health and Human Services (includes the NIH) and the NSF, 20-33 per cent are found guilty. Annual costs, at the institutional level, of $110 million are incurred for all such research-misconduct investigations in the US. From 1992-2012, 291 scientific papers published under NIH grants were retracted due to misconduct, accounting for $58 million in direct funding from the agency. Obviously, the incidence of undetected misconduct is greater, some multiple of the cases judged as such each year.

The true incidence is difficult to predict. A comprehensive meta-analysis of research-misconduct surveys during 1987-2008 indicated that one in 50 scientists admitted to committing misconduct (fabrication, falsification, and/or modifying data) at least once, and 14 per cent of scientists knew of colleagues who had done so. Most likely, given the sensitivity of the questions asked and the low response rates, these numbers are an underestimate of the true incidence. Since 1975, in life science and biomedical research, the percentage of scientific articles retracted has increased tenfold; 67 per cent of the retractions were due to misconduct. Hypotheses for the increase include the ‘lure of the luxury journal’, ‘pathological publishing’, insufficient misconduct policies, academic culture, career stage, and perverse incentives. From climate science to galvanic corrosion, we have seen research published that denigrates the scientific ethos, and undermines the credibility of the scientific community and everyone in it.

The principle of self-government in academia is strong, and this is a distinguishing feature of the modern research university. Science is expected to be self-policing and self-correcting. We have come to believe, however, that incentives throughout the system induce all stakeholders to ‘pretend misconduct does not happen’. It is remarkable that science never developed a clear system for reporting and investigating allegations of research misconduct. Individuals who do allege misconduct don’t have an easy, evident path to do so, and risk suffering severe negative professional repercussions. In relation to what is considered fair in reporting research, grant-writing practices and promoting research ideas, scholars operate, to a great extent, on an unenforceable and unwritten honour system. Today, there are compelling reasons to doubt that science as a whole is self-correcting. We are not the first to recognise this problem. Scientists have proposed open-data, open-access, post-publication peer review, meta-studies and efforts to reproduce landmark studies as practices to help compensate for the high error rates in modern science. Beneficial as these corrective measures might be, perverse incentives on individuals and institutions remain the root problem.

There are exceptional cases in which individuals have provided a reality check on overhyped research press releases, especially in areas deemed potentially transformative (for example, Johnathan Eisen’s real-time commentary on some mania surrounding the ‘microbiome’). Generally, however the limitations of hot research sectors are downplayed or ignored. Because every modern scientific mania creates a quantitative metric windfall for participants, and because few consequences come to those responsible when a science bubble bursts, the only effective check on pathological science and a misallocation of resources is the unwritten honour system.

Misconduct is not limited to academic researchers. Perverse incentives and hypercompetition also come to bear on federal agencies, giving rise to a new phenomenon of institutional scientific research misconduct. The US Centers for Disease Control and Prevention (CDC), for example, produced an erroneous report on the drinking-water crisis in Washington, DC, claiming that the extremely high levels of lead in the water did not cause an elevation in the local children’s blood lead levels. After refusing to correct or defend their research, Congressional investigators had to intervene, and found the report to be ‘scientifically indefensible’. A few months after being chastised in Congress, the same branch of CDC wrote what a Reuters investigation called yet another ‘flawed’ report on lead contamination of soil, drinking water and air in East Chicago in Indiana that left vulnerable children and minorities in harm’s way for at least five years longer than was necessary.

If we don’t reform the academic scientific-research enterprise, we risk significant disrepute to science

The US Environmental Protection Agency (EPA) also published scientific reports from consultants based on non-existent data in industry journals. More recently, the EPA silenced its own whistleblowers during the water crisis in the city of Flint in Michigan. As agencies increasingly compete with each other for reduced discretionary funding and maintaining existing cash flows (CDC’s desire to focus more on lead paint, as opposed to lead in water, for example), they seem to be more inclined to publish ‘good news’ instead of science. In an era of declining discretionary funding, federal agencies have financial conflicts of interest and fears of survival, similar to those in private industry. Given the common misconception that federal funding agencies are free of such conflicts, the dangers of institutional research misconduct might rival or even outweigh those of industry-sponsored research, given that there is no system of checks and balances, and consumers of such work might be overly trusting.

If we don’t reform the academic scientific-research enterprise, we risk significant disrepute to and public distrust of science. The modern academic research enterprise, which The Economist has derided as a ‘Ponzi scheme’, operates on a system of perverse incentives that would have been almost inconceivable to researchers 50 years ago. We believe that this system presents a real threat to the future of science. If immediate action is not taken, we risk creating a corrupt professional culture akin to that revealed in professional cycling (ie, 20 out of 21 Tour de France podium finishers during 1999-2005 were conclusively tied to doping), where an uncontrolled perverse-incentive system created an environment in which athletes felt that they had to cheat to compete. While pro-cycling suffered severe disrepute due to prolific doping scandals instigated by a burning desire to win at any cost, the stakes in science are much higher. The loss of altruistic actors and trust in science would bring even greater harm to the public and the planet.

In recent years, academia has witnessed unqualified success in acknowledging numerous important issues, including those of demographic diversity, work-life balance, funding, better teaching, public outreach, and engagement – attempts are being made to address many of these problems.

All scientists should aspire to leave the field in a better state than when we first entered it. The very important matters of state and federal funding lie beyond our direct control. However, when it comes to the health, integrity and public perception of science and its value, we are the key actors. We can openly acknowledge and address problems with perverse incentives and hypercompetition that are distorting science and imperilling scientific research as a public good. Some relatively simple steps include arriving at a better understanding of the problem, by systematically mining the experiences and perceptions of academics in STEM fields, via a comprehensive survey of high-achieving graduate students and researchers.

We can no longer afford to pretend that the problem of research misconduct does not exist

Second, the NSF should commission a panel of economists and social scientists with expertise in perverse incentives to collect and review input from all levels of academia, including retired National Academy members and distinguished STEM scholars. With a long-term view to fostering science as a public good, the panel could also develop a list of ‘best practices’ to guide evaluation of candidates for hiring and promotion.

Third, we can no longer afford to pretend that the problem of research misconduct does not exist. At both the undergraduate and graduate levels, science and engineering students should receive realistic instruction on these subjects, so that they are prepared to act when, not if, they encounter it. The curriculum should include review of real-world pressures, incentives and stresses that can increase the likelihood of research misconduct.

Fourth, universities can take measures immediately to protect the integrity of scientific research, and announce steps to reduce perverse incentives and uphold research misconduct policies that discourage unethical behaviour. Finally, and perhaps most simply, in addition to teaching technical skills, PhD programmes themselves should accept that they ought to acknowledge the present reality of perverse incentives, while also fostering character development, and respect for science as a public good, and the critical role of quality science to the future of humankind.

This article is an abridged version of the journal paper ‘Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition’, published in Environmental Engineering Science, and was written to reach a wider audience. Original paper © Marc A Edwards and Siddhartha Roy, 2016.

7 November 2017