‘Seeing is believing’, as the saying goes, and so empirical data are the lifeblood of science and the scientific method. Science progresses by winnowing out those hypotheses or theories that fail under increasingly large amounts of data; favoured hypotheses stand up to the largest, most diverse sets of data and are said to have ‘high explanatory power’. Imagine, then, what would happen if each time a new study was done, the researchers had to rely on different, small datasets created using different data-collection protocols. How surprising would it be if the results of each study favoured different hypotheses? How difficult would it be for the scientific community to judge which hypotheses had the greater explanatory power?
That scenario may sound absurd, yet it describes the actual situation facing researchers trying to understand our species’ evolutionary history though the anatomical study of ancient fossils (not to mention many other scientific fields). Often, the most interesting remains are physically accessible to only a small number of scientists. Everyone else is then forced to rely on published descriptions or secondhand impressions. Yet many recent finds – such as Homo floresiensis (the ‘Hobbit people’), Ardipithecus ramidus (‘Ardi’, a predecessor and possible ancestor of our genus, Homo), and most recently Homo naledi (which displays an intriguing mix of ancestral and derived traits) – raise huge questions about human evolution that are impossible to answer properly without broad access to the raw data: the fossils themselves.
A more rigorous and desirable approach to evaluating hypotheses in anthropology is to compile large datasets, which allows study of whole populations. Nonetheless, because the currency of a study’s scientific power has been the number of observations, scientists have tended to publish as little of the raw values as possible in order to avoid helping a peer-competitor build a dataset and get the scoop on interesting results. The culture of ‘data holding’ has promoted frustrating inefficiencies. Many specimens in anthropology collections are pockmarked where dozens of researchers have set their callipers to take the same measurements again and again. Another consequence of a lack of access to raw datasets is that it is exceedingly difficult to determine whether apparent differences in anatomical patterns among living species and fossils from one study to the next are due to observer error, measurement protocols, the use of different samples, or genuine evolutionary features.
The technology to scan and store 3D renderings of bones and other specimens allows a fundamentally new approach. The digital format makes it possible to distribute raw data about specimen anatomy around the world in an instant, with the potential to throw the datasets wide open, allowing direct assessments of repeatability and compilation into ‘super-datasets’ from which everyone works. Such a systemic change could transform the entire study of human origins and evolutionary biology writ large.
To aid that transformation, my colleagues and I at Duke University have created a website called MorphoSource that archives 3D data from microCT scanners, surface scanners and photogrammetry, and more. Researchers can then access those files from anywhere they have an internet connection. It is, in essence, a global repository of virtual fossils and comparative anatomical data (including scans of bones and even soft tissue when available). We designed MorphoSource to make museums’ and individuals’ collections easily discoverable and searchable, and added tracking tools to record when and how items from these collections end up in peer-reviewed papers.
The intellectual rewards of such open sharing for the advancement of science are obvious, but if we want the researchers and museums to open their collections for digital distribution, there needs to be direct incentives for those holding the collections, as well. Funding agencies are most concerned with whether the data collection they support contributes to peer-reviewed scientific advances. To that end, we are focusing on ways to allocate academic credit to researchers for the impact of shared data. Likewise, museums can improve their demonstrated value if they open up their collection data – especially if they allow digital access. The potential impact is especially great for remote museums with small but important collections, which could suddenly be accessible to the whole academic world.
Less than two years old, MorphoSource is already changing the style of anthropology and comparative evolutionary biology research. Recently, the University of Witwatersrand in South Africa partnered with MorphoSource to distribute 3D renderings of its unique collection of hominin fossils (where ‘hominin’ refers collectively to modern and extinct human beings, plus their direct ancestors). The primary release of 86 specimens came in September 2015, and included remains of the new species Homo naledi. Since then the project has received more than 44,000 views and 7,800 downloads by other researchers, educators, and even private individuals. In four months, these fossils have been observed more frequently than all other hominin fossils combined since the first catalogued discoveries in the 1850s. The scientific community has often had difficulty persuading the public to accept that evidence for evolution really exists behind the closed doors of museum archives. Now those doors are potentially opening to everyone.
By its mere existence, MorphoSource creates a pressure for greater data-sharing. Previously, there was no system that could systematically accommodate large numbers of detailed digital datasets on museum specimens. Now that MorphoSource is live, editors and grant reviewers are beginning to forcefully suggest, and sometimes require, that researchers reposit their raw datasets for access there. In the case of Homo naledi, some of the strongest critics of the original publication (and its open access framework!) have accessed the remains through MorphoSource. Disputes can now more easily be framed in terms of data, not impressions and interpretations.
Many institutions and individuals intuitively sense that open access is too soft for their hard-won data, or they may have contracted with stakeholders that take this view. MorphoSource therefore allows researchers and institutions merely to ‘advertise’ the existence of datasets, which then require an active request for access. Restrictions that seemed necessary often prove not to be, however. For example, Harvard’s Museum of Comparative Zoology routinely (and responsibly) prohibits researchers from giving third parties access to data acquired from using the MCZ collection. When we recently published microCT scans of 431 primate specimens in the Nature journal Scientific Data, we were able to engage the museum in a dialogue that resulted in the relaxation of restrictions in the case of this dataset to allow open access and a less restrictive copyright licence. Since publication, the specimens have been viewed 40,000 times and downloaded 2,500 times.
Despite these successes, I still have some concerns about our initiative. One key issue is the archival longevity of 3D digital data: bones don’t go away, but digital files can become unreadable or lost. Hardware engineers need to develop media that can store data in recoverable ways for decades or centuries, even without constant power. Software engineers must create systems that monitor data-format usage and automatically update data to guard against corruption and obsolescence.
An important goal of MorphoSource is to help comparative morphology become a data-driven science accessible to all biologists, regardless of the stage of their career or area of expertise. This can happen by creating virtual museums: the research community will contribute most of the data, and university libraries will provide the digital infrastructure. Such virtual museums will not only tear down limits on access, they will also enable novel research approaches. Digital collections could support automated, standardised methods for analysing bones – methods that my lab and others are beginning to pioneer. Such a change could reduce individual subjectivity, and would make physical anthropology more welcoming to thinkers from other fields.
No topic is of more universal interest than the origin of our own species. It seems only fitting that the museum collections documenting the continuity of anatomical form, uniting humans with the diversity of life, should finally be thrown wide open.