About the Event
The ARDC Community Data Lab (CDL) provides tools, datasets, analysis environments and collaboration options for humanities, arts, social sciences, and Indigenous research in Australia. As CDL enters its second phase, it will focus on co-designing 5 new capabilities to support research and research translation.
The ARDC invites you to a workshop to co-design one of these capabilities: Tools and workflows for correction, annotation and markup of digitised documents and collections.
The goal of the co-design workshop is to evaluate this co-investment opportunity to help develop a solution that supports new and expanded research.
Value of Capability to HASS and Indigenous Research
Collections and GLAM (galleries, libraries, archives, and museums) organisations often have important information stored on documents attached to collection items, like labels and supporting paper or cards. While it’s easy to create digital images of these documents, accurately extracting the text and linking it to the right fields is challenging, especially with handwritten labels. Many valuable texts are also locked away in historical PDF documents with poor or no optical character recognition (OCR). GLAM institutions provide access to books, newspapers, magazines and more in PDF format, but searching and answering research questions across large collections of these documents is difficult due to variations in layout, content, and OCR quality.
Addressing these challenges can significantly improve the efficiency of digitising data associated with Australian collections. When this data is made more accessible and easier to find, the value of the collections is greatly enhanced, allowing them to contribute more effectively to research.
Through this new capability, we will develop tools and workflows to enable effective searching across these collections using both semantic and keyword-based search methods.
What to expect during the Workshop
Workshop participants are expected to actively engage in discussions. During the workshop, participants will be introduced to the problem to be addressed, along with the capability that we are proposing to develop as a solution to this problem for HASS and Indigenous research.
The workshop will be held virtually via Zoom. Participants will join breakout rooms and respond to questions using a Miro board. Our goal is to gather feedback on the value of the proposed solution and to assess whether we’re addressing the problem in the most effective way.
The proposed solution has been developed by ARDC in partnership with Melbourne Data Analytics Platform (MDAP), University of Melbourne.
After the workshop we will publish a report that captures the input that we have received. Your insights will help us refine our approach, ensuring that we are on the right track.
If you have difficulty accessing Zoom or Miro please let us know ahead of the workshop and we will provide alternative methods for you to provide your feedback.
Learn more about the ARDC Community Data Lab and the co-design process for phase 2 by watching the recording of our recent webinar, or viewing the slides.
Who will be speaking?
- Ellen Lyrtzis, Skills Development Lead (NCRIS), ARDC
- A/Prof Nic Geard, Computing and Information Systems, University of Melbourne
Who should attend?
- Researchers working with digitised documents with no or low-quality or difficult-to-process text information (such as handwritten notes attached to collections)
- Research infrastructure providers and digital skills trainers and those who support researchers working with digitised documents
- Researchers working with historical textual documents commonly published in large institutional repositories like Trove, National and State Archives, as well as international repositories like Gallica.
What participants will gain from the session?
- Contribute your use cases and experience about the capability
- Discuss the capability and the challenges it addresses with your peers
- Help shape how the capability will be delivered for HASS and Indigenous research
Join More Co-design Workshops for the ARDC Community Data Lab
The second phase of the CDL is focusing on co-designing 5 new capabilities to support research and research translation. Join other co-design sessions relevant to your research and work:
- Public Interest Documents: help co-design an easy-to-use curated national Hansard dataset for research. Tuesday 25 Feb, 12 noon AEDT – register now
- Curated Collections for Enduring HASS and Indigenous Data: help co-design a new national service to publish HASS and Indigenous data collections as websites. Tuesday 4 March 12 noon AEDT – register now
- Framework For Research Software Engineers: help co-design a framework for recommended patterns in software engineering when working with HASS and Indigenous data. Friday 14 March, 12 noon AEDT – register now
- Tools and Workflows for Visualising and Analysing Data Stored in RO-Crates: help co-design research tools and workflows for visualising and analysing data stored in RO-Crates. Tuesday 18 March 12 noon AEDT – register now
Further HASS and Indigenous Research Data Commons resources
- ARDC Community Data Lab Project
- HASS and Indigenous Research Data Commons
- Resources for HASS and Indigenous research
Will the session be recorded?
To ensure participants’ privacy, we will not publish a recording of the workshop.
Have questions?
Email [email protected]
Please note that this event may be recorded. This may include your contributions during the session. ARDC respects the privacy of individuals. Information collected is in accordance with the ARDC Privacy Policy.