A massive number of open datasets are now available on the internet. The World Bank, World Health Organization, research institutions and national governments around the globe are but a few of those contributing to the deluge of datasets now available online. These datasets are critical for researchers who are trying to address the big challenges facing our society.
To help people discover datasets, thousands of online data repositories have been created. These provide access to millions of datasets from governments, research institutions, scientific publishers as well as data brokers.
However, just because data is out there, doesn’t mean it’s easy to find.
Recent research published in the Journal of Documentation sought to understand the behaviour of those searching for datasets to improve data discovery services.
The authors used a year-long search log data from the ARDC’s Research Data Australia, a portal for finding research data and associated projects, researchers and data services from over 100 Australian research organisations, government agencies and cultural institutions.
Using a machine learning algorithm, the authors discovered distinct user profiles, each with different search behaviours. The 6 groups of users are: Expert Research, Expert Search, Expert Explore, Novice Research, Novice Search and Novice Explore.
Paper co-author Prof Jenny X. Zhang, from the School of Computing Technologies at RMIT University, remarked: “Information seeking behaviour has long been researched by both academia and industry. Many information discovery systems (such as web search engines) have benefited from this research. We have started to research and understand user data seeking behaviour as data from search logs, surveys and interviews become available. The availability of the ARDC search log data has enabled high quality research output from Romina Sharifpour’s master (minor) thesis”.
“Our findings about user data search behaviour can help design better data search systems that can tailor search results for the diverse information needs of different user groups. These findings also contribute to the knowledge base of information science.”
Dr Mingfang Wu, Senior Research Data Specialist at the ARDC is also a co-author of the research paper.
“These findings will help us improve search results we present in Research Data Australia to make research datasets easier to find. They also help the hundreds of similar data discovery portals around the world who don’t have the resources to fully understand how they can improve their search results.”
Dr Wu said that we are not facing the data discovery issue alone.
“Data discovery is not only being researched by academic researchers, but also by data practitioners from many groups around the world. The Data Discovery Paradigms Interest Group of Research Data Alliance is one of those groups. It brings together people from various professional backgrounds to attack the issue of data discovery from different perspectives, such as resource organisation, data search system development, and cognitive studies of data search behaviour. The Interest Group produces actionable guidelines for making data more discoverable, and ultimately assists in the reuse of data.
“Data discovery is the realisation of the enormous effort in making data open and FAIR by the ARDC and our international collaborators in the Research Data Alliance. ”
Dr Wu is also currently engaged in complementary analysis work of surveys and interviews (forthcoming), all of which will allow the ARDC to continuously improve data discovery and support leading edge research.
To engage in discussion about data discovery, join the Research Data Alliance Data Discovery Paradigms Interest Group, or if you have specific questions, please contact Dr Mingfang Wu via our contact page.
The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.