Linking Data and Publications – a Crisis in Science?

Through the Research Data Alliance, the ARDC is helping to expose the links between journal papers and datasets to enable validation of research findings and support reproducibility.
abstract dots and lines connected by lines in purple

Through the Research Data Alliance, the Australian Research Data Commons (ARDC) is working to change the culture around how data is referenced in publications internationally and in Australia. 

Exposing the links between journal papers and datasets is supporting Australian researchers with citation of their research outcomes, enabling validation of research findings and supporting reproducibility.

The link between data and publications lies at the heart of the integrity of the scientific process. It tethers the research conclusions to the evidence that supports them. And both data and publications are the giant shoulders upon which future generations of scholars will stand to gain new scientific insights.

Science is inherently universal, and research is therefore naturally international, so scientists want to know about links between data and publications from all their colleagues worldwide.

Yet, in what some call a “replication crisis”, it is surprisingly difficult to find out what data supports a particular publication and which publications reference a particular dataset.

The size of the scholarly enterprise is one of the confounding elements, with tens of millions of researchers publishing hundreds of millions of publications underpinned by countless datapoints.

‘Cottage-industry’ solutions to this problem involve individual research groups, publishers or data centres each trawling the web to try to discover links.

Scholix – a socio-technical solution to a global challenge

Realising that a global ‘industrial’ approach was needed to get a complete picture of data–literature links, Scholix was created.

Scholix, an initiative of the Research Data Alliance (RDA) and the World Data System (WDS), is a technology solution designed by publishers, data centres and researchers, which leverages existing global systems for tracking references between publications.

Scholix logo
Linking journal publications to their underlying datasets enables research validation and reproducibility.

Scholix enables scholarly publishers to expose links between the research articles they publish and the underlying datasets held in data repositories anywhere in the world. Someone who reads a journal article can follow the link to the underlying dataset.

Not only does the exposure of these links facilitate data discovery, it also enables validation of the research findings and supports reproducibility.

Because Scholix links are bi-directional, you can also search for publications linked to a dataset. Funders and managers of data repositories, for example, can easily discover the journal articles that make use of their data repository.

Scholix also supports links between datasets.

CrossRefDataCite, and OpenAire—sustainable global service providers that collect data–literature links from large communities—have embedded the workflows, common information model and exchange protocols of Scholix into their products, and now offer public queries with a Scholix-compliant interface.

Critically, publishers and research peak bodies such as the International Association of Scientific, Technical and Medical Publishers (STM), the Institute of Electrical and Electronics Engineers (IEEE) and the American Geophysical Union are championing a culture change in the way data is systematically referenced in publications.

The ARDC provides Digital Object Identifier (DOI) services as a member of the global DataCite initiative for 68 Australian universities and institutions. A DOI is used to identify research data and provide a persistent link to its location on the internet, which facilitates citation, attribution and discovery of the data. The ARDC has minted almost 500,000 DOIs for Australian research since 2011. Learn more about DOIs

RDA adds value for Australian researchers via the network effect

The fact that Scholix was brought about by an international team of volunteers is remarkable.

The RDA/WDS working groups spent years establishing the Scholix interoperability framework and gaining social and technical consensus among infrastructure providers, data repository managers, and publishers.

Dr Adrian Burton, Director of Data and Services at the ARDC, co-chaired the working group. He credits RDA—a network of over 12,000 researchers, scientists and data science professionals—for enabling the international collaboration that is required for such a global initiative.

“An increasing number of Australian researchers publish overseas, so it is important to get publishers and peak bodies aligned in terms of policy, culture and technology.

“RDA provided a trusted forum to bring that partnership together. But that’s the thing about RDA—everyone involved brings their own resources to the table to create  a network effect.”

“RDA allows anyone to make linkages across the world. In this case, our participation in RDA has enabled an Australian perspective to be included in this important global initiative.”

Harmonising publishers’ data policies

The success of any universal system linking data and publications hinges on the commitment of publishers to adopt it. With publishers having different data policies, a one-size-fits-all framework takes time to develop.

“Some journal publishers want the data for peer review, some don’t”, says Adrian. “Some require quality standards. There are different data availability or data citation requirements for different journals, and that’s ok—they have different needs.”

An RDA interest group on data policy standardisation, co-chaired by the ARDC’s Natasha Simons, has been working with publishers and funders to harmonise data-sharing policies, complementing the technical work by the Scholix initiative to enable data–literature linking.

After extensive consultation, in 2020 the group published a research data policy framework for all journals and publishers, and its adoption by journals continues to rise.

“The framework is an effort to standardise journal data policies, making it easier for researchers to understand and comply with these policies and, in the long run, it will result in an increase in data availability,” says Natasha. “It is hugely encouraging to see it being adopted by journal editors around the world.”

What’s next for Scholix?

The working group participants are maintaining the momentum of the broader initiative after the formal working group stage—firming up the framework, encouraging more hubs to adopt it, and bringing it to the attention of contributors and consumers of data–literature links.

The vision is to have a system that supports notifications of links, a dashboard, and the ability to show researchers and research institutions the impact of a dataset.

It’s not just technologically ambitious. “There are ethical reasons too— datasets should be cited. So there is also the need for cultural change”, says Adrian.

While the science reproducibility crisis has many drivers, “the systematic open linking of literature with data is a big step”, he adds. “RDA-enabled initiatives such as Scholix and the standardisation of data policies make doing the right thing easy.”

Learn more about ARDC’s persistent identifier services, and our involvement in the Research Data Alliance.

The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.


Mary O'Callaghan

Reviewed by

Natasha Simons (ARDC), Adrian Burton (ARDC), Jo Savill (ARDC), Adelle Coote (ARDC), Stefanie Kethers (ARDC/RDA)

Research Topic