At the start of a data-driven research project, researchers source, search and select datasets relevant to the research topic. Thanks to the open data movement, more data is freely available today through data repositories than ever before, and more data is now findable, accessible, interoperable and reusable (FAIR). However, while data might be open, it still needs to be found in the myriad of repositories, research papers and search engines.
A recent report from the ARDC investigated how researchers approach data discovery, and provides recommendations for data collectors and data custodians, along with those who operate data discovery systems.
Dr Adrian Burton, Director of Data, Services and Policy at the ARDC, said data discovery is vital to accelerating research and innovation in Australia.
“In this information age, the innovative use of data in research is vital for new social, economic, and environmental benefits. As a result, there are significant national and international efforts to generate, collect, and make data available for new uses. But if researchers can’t find the data they need for research, those benefits are never realised.
“The ARDC has conducted this study to help data service providers better connect with their users – to understand what’s going on when researchers look for data. This understanding will help the ARDC and our partners improve data discovery services to meet user’s needs,” said Dr Burton.
Understanding how researchers discover data helps:
- The research data community: to coordinate activities with the research community to maximise the value of data assets
- Data repository operators: to identify how to improve their data discovery service
- Data curators: to enrich metadata by identifying data attributes that are important for data discovery, access and reuse.
Uncovering the Ingredients for Successful Data Discovery
To reveal how researchers discover data, interviews were conducted with the support of 3 ARDC partners that have their own data discovery services: Terrestrial Ecosystem Research Infrastructure (TERN), Australian Data Archive (ADA) and CSIRO. The 3 partners also supported the project by reaching out and recruiting researchers engaged through their services to participate in the study.
Dr Mingfang Wu, Senior Research Data Specialist at the ARDC, is a co-author of the report.
“Our findings suggest that there are significant technical and social aspects in the research process to enable effective data discovery.
“The ways researchers discover data is currently influenced by their institutional and academic community networks, such as their known data repositories, social networks and part of a consortium of collaborative research projects.
“However, researchers face challenges in discovering data from data repositories – in particular when the required data for a research project have to be discovered and synthesised from multiple data repositories.”
Three observations are highlighted by the report authors:
Social networks, as well as literature surveys, are important tools for researchers to learn where to source data.This finding means:
- Data communities need to actively engage with research communities, especially to promote data to cross-disciplinary research.
Training workshops that benchmark datasets, and their access and use methods, would be useful for reaching out to researchers and those who want to pursue interdisciplinary and collaborative research projects.
Data managers and collectors are very often consulted in making sense of data, and vice versa, researchers’ feedback of data reuse helps to improve data quality.
This finding means:
Data repositories need to re-imagine their roles within the changing scholarly communication system by integrating or collaborating with their stakeholders.
There is much room for data discovery service to improve:
- Data discovery across data repositories and disciplines needs to be joined up to avoid cherry-picking and duplicates.
- Indexes, search interfaces and filters need to be enhanced, so that, for example, researchers can match concepts and data variables that are likely used by a discipline.
Clear information about data quality, licence, data format, version, and provenance needs to be provided.
The ARDC will build on the findings of this report by doing further analysis of the rich interview data.
The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.