Collections as Data in Australia

Australian participants at an international summit on Collections as Data share their reflections on computational accessibility of collections held in GLAM institutions, and invite your contributions.
Collections as Data Summit participants, at Internet Archive Canada in Vancouver on 26 April 2023.
Collections as Data Summit participants, at Internet Archive Canada in Vancouver on 26 April 2023.

14 September 2023 update – The Vancouver Statement on Collections as Data is now published.

31 July 2023 – In early 2023, 4 Australians joined 60 GLAM (galleries, libraries, archives, museums) practitioners and disciplinary scholars from 18 countries for the summit Collections as Data: State of The Field and Future Directions, held in Vancouver at the Internet Archive Canada. The summit was part of the Mellon Foundation-supported Collections as Data: Part to Whole – an effort focused on supporting responsible computational use of GLAM collections.

While a recap of the summit is available, we want to share the perspectives of the Australian participants. One of the broader goals of the ARDC’s Humanities, Arts and Social Sciences and Indigenous Research Data Commons (HASS and Indigenous RDC) is to enable and support computational HASS and Indigenous research. Computational research in these domains relies on the data held within collections to be digitally and ethically accessible based on the FAIR and CARE principles. While the GLAM sector already contributes huge data collections about our society and culture, many collections remain inaccessible for research and the community. There is enormous potential for more data to be unlocked. Yet the task can seem an insurmountable challenge. 

In Australia, the State Library of Queensland is a leader in viewing collections as data, as demonstrated in its digital strategy: We will see collections as data – and collaborate closely with those who are using data, machine learning and artificial intelligence technologies to discover more about Queensland’s past, present and future to tell compelling stories. We will also expand and enrich the data we hold, ensuring that we provide datasets that are relevant to our users.…and this was further reflected in their 2020 event: Making meaning.

At the summit, the participants reviewed the Santa Barbara Statement principles for Collections as Data and made many comments and suggestions to improve them. The summit has now released the Vancouver Statement on Collections as Data, which is now open for comment until 11 August 2023 (extended from 4 August).

We invited the 4 participants from Australia to share a bit about themselves and their key takeaways from the summit. Read below to hear their thought-provoking reflections and contribute your comments to the Vancouver Statement.

The 4 participants from Australia were:

Robert McLellan is a proud Gooreng Gooreng descendant of the Wide Bay region, QLD, community researcher and an experienced Director, governance and engagement practitioner. He is the Program Manager for the Language Data Commons of Australia (LDaCA) at the University of Queensland as well as an Industry Fellow within the Faculty of Humanities and Social Science. A strong advocate for truth telling and speaking up for Aboriginal people’s rights and justice, economic advancement, and to ensure First Nations voices are authentically valued and embraced across all levels of society, Robert is passionate about revitalising Indigenous languages and building culturally inclusive, honourable, and cohesive communities.

Alexis Tindall is the Manager, Digital Stewardship, University of Adelaide, where she oversees the Library’s research data management support and the Library’s digital preservation initiatives. She applies experience gained from supporting digital humanities, arts and social sciences research in roles at the ARDC and eResearch South Australia (eRSA). Prior to working in research support, she managed digital collections and digitisation projects in the museum sector. She is keenly interested in the computational accessibility of digitised and born-digital GLAM collections for all kinds of applications, including supporting research. 

Margaret Warren is the Director, Digital Delivery at the State Library of Queensland. Her team’s work focuses on the continuous improvement of the online discovery experience for all State Library clients and staff. They coordinate the management and integration of library software applications and are responsible for exploring, researching and developing new ways of making our collections discoverable online, and encouraging people to engage with collections across multiple platforms, including open data and collections as data initiatives, and the library website, blogs and other online presences. She is also keenly interested in copyright and the application of Creative Commons in libraries.

Duncan Loxton is an archivist interested in creating avenues of meaningful access to research data and true collaboration in fieldwork. Duncan works as a Data Curator at the University of Technology Sydney Library and the Aboriginal and Torres Strait Islander Data Archive (ATSIDA) where he strives to support the individual and collective rights of researchers, community groups and institutions to control the circumstances in which their knowledge is shared and applied.

What were your insights from the Collections as Data summit?

Robert: In the presence of a globally diverse cohort of delegates, I am enthralled to reflect upon the experiences of our respective communities and the many ways in which similarities are shared. Historically, early explorers, linguists, anthropologists, and other researchers have developed a reputation for collecting material and data in ways that are widely considered unethical in a modern-day context. By Indigenous communities, there are increasing levels of mistrust within large institutions (both Indigenous and non-indigenous) to handle data in a conscionable and culturally responsible manner. Indigenous research communities across the globe face these challenges regularly. Therefore, we must aim to disconcert existing collections methods that do not support the security and longevity of Indigenous datasets and move to:

  1. reframe institutional relationships with communities – noting principle 4 which supports collaborative and participatory engagement processes
  2. reform to enable communities’ voices to be included and to embed First Nations perspectives within the data lifecycle and decision-making processes pertaining to specific collections.

Evermore prominent is the need to embrace diverse understandings, insights and experiences within this area as well as build upon the capabilities of a community of practice, which is what the draft Vancouver statement on collections as data seeks to do. 

I was overwhelmed by hearing many, many experiences from around the globe in the space that we’re working in. A big theme of the summit was machine accessibility and what that means for collections. 

I left feeling reassured that we’ve been going on the right path with our work and we’re on the right track. This is certainly the case with the HASS and Indigenous RDC, which is tackling many of the challenges raised at the summit.

Alexis: There was an extraordinary amount of expertise at the summit from a range of institutions. Yet it became obvious we are all facing the same challenges. My key takeaway was the reminder that the approach towards computationally accessible collections is a task of many parts. There’s not one big goal/digital corpus that we’re aiming for. Participants from libraries, archives and biological collections all have different perspectives on that challenge. It was a good reminder that while the scale of the task to digitise collections can seem overwhelming, it’s a task of many parts, and we’ll never have all collections accessible at all times.

Margaret: I was inspired and awed by the expertise and knowledge of the attendees at the summit, and reflected that nuanced and collaborative approaches to this work are occurring worldwide. It is not an effort that will work in a ‘one system for all’ approach, and while this is liberating and exciting, it could also present challenges when communicating the value proposition of this work outside of those who are not already involved and invested in the work. Libraries have always seen collections as ‘data’ from an analogue perspective, with individual researchers working hard using multiple methods of inquiry to bring new knowledge and insights out of our collections. Using computational methods to do some, or major parts of their work requires different skill sets and also presents opportunities and challenges at scale. 

Duncan: We all work in different realities, with different resourcing and staffing across varied collections, but while we might not share the same opportunities we still share a lot in common! One of my takeaways is that we all share frustrations with the time it takes to organise and clean up data, so we should offer to give the task the space it needs when planning our projects and be careful to keep our ambitions in check. If it’s a job that becomes insurmountable it needn’t pull you up short, as it’s also possible to share data as a work in progress with a note of its unfinished status.

What are the opportunities for Australia in considering collections as data?

Robert: It can be difficult to motivate and resource computationally accessible collections. Yet those with an aggregating role, like the Language Data Commons of Australia (LDaCA) and the HASS and Indigenous RDC, can recommend collections as data frameworks that create interoperability. This can leverage the projects to create something bigger.

The scope of the LDaCA project relies heavily upon the institutional willingness of partnered and related-party institutions to support a cultural change in the context of the rights of First Nations peoples, their sovereignty, access to data, and decision-making authorities over their data. It is important that we build upon the conversations initiated by ‘Collections as Data’ to build a genuine appreciation for better management processes and to see that our communities locally are able to access the data they require. 

Alexis: Availability and use of a well-supported framework can leverage the dispersed effort of digitising legacy collections, and making digital collections available, in a way that increases impact and potential. The interest in applying digital research approaches to Australian collections continues to grow, discoverable and useful collections can open new and exciting avenues of research. This doesn’t only need to be turning collections into big data, it can also mean improved discoverability of small, specific, or niche collections for the right audiences. 

Margaret: There are many opportunities for organisations and researchers to incorporate computational methods into their research. Collecting institutions, such as the State, Territory and National Libraries need to consider carefully how we make our collections available to facilitate computational research methods in ways that don’t require labour-intensive clean up and restructuring before any work can happen. Incremental changes, such as considering collections as data requirements before digitisation, and accelerating access to open APIs, are ways I can see as immediate improvements.

Duncan: The work of collections as data doesn’t necessarily need to be on show and visible to everyone. We can turn the collections as data imperative inwards to support offline community engagement and improve behind-the-scenes service delivery, particularly when access to these collections needs to be carefully controlled or is otherwise circumscribed.

What are your next steps in progressing towards computationally accessible collections?

Alexis: In my role supporting research data management at the University of Adelaide, I continue to encourage tools, workflows, and decision-making that enable access and use of data within the FAIR and CARE principles. As well as making these data available now for review, reproducibility and re-use, these approaches at the time of creation and publication will aid the longevity and sustainability of those data. We have so many examples of historic collections enabling new discovery decades or even centuries later, we need to work now to ensure that our born-digital collections and research outputs are similarly preserved for our users of the future. 

I’m also excited by the potential new approaches that will be enabled by the data discoverability and integrated tools of the HASS and Indigenous RDC, and I’ll champion it for researchers of all disciplines at the University of Adelaide and in my broader networks. 

Margaret: State Library of Queensland is planning another Collections as Data symposium in March 2024, where we will be leaning into the opportunities and challenges of this work with colleagues from across multiple sectors with an interest in this work. We will profile Australian and international projects, and ask the hard questions of ourselves about what our next steps are to realise the potential of collections as data. I’m also excited about working with teams at State Library of Queensland on digitising with a collections as data intention. Finally, the Digital Collections Catalyst for 2024 is open for applications, to support innovative and creative uses of our digital collections and collections data.

Duncan: I’m looking forward to working with our Aboriginal and Torres Strait Islander researchers, organisations and communities to better meet their needs in providing computational access to data in ATSIDA’s custody. The future looks ever brighter with the contemporary work of the ARDC HASS and Indigenous RDC Program in building a social architecture of Indigenous Data Sovereignty in Australia.

Whitepaper Closed for Feedback

The Collections as Data whitepaper was available for feedback until 11 August 2023. We encouraged the Australian GLAM community to share their feedback on the Vancouver Statement on Collections as Data, and many did so.

While the ARDC has no formal role in the Collections as Data project, we would like to learn more about the Australian GLAM community’s view on this approach due to its strong links with the HASS and Indigenous RDC. We invite you to also send your comments and feedback to us via email.

Read the Vancouver Statement on Collections as Data.

Stay up to date with the HASS and Indigenous Research Data Commons – register your interest.

The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.