The Challenge
Australia is a massively multilingual country, in one of the world’s most linguistically diverse regions. Significant collections of this intangible cultural heritage have been amassed, including collections of Australian Indigenous languages, regional languages of the Pacific, and Australian English.
There are also language collections important for cybersecurity (AusTalk, Australian National Corpus, corpora of regional languages), for gauging popular sentiment (Australian Twitter Corpus), and for emergency communication (languages of the region and some Indigenous languages).
However, much of Australia’s language data is scattered, hard to find, and in danger of being lost. Many collections remain under-used and researchers lack the tools and skills to exploit their research potential.
The Response
We’ve established the Language Data Commons of Australia (LDaCA), an integrated national infrastructure that supports language research. It enables researchers and communities to access and use nationally significant collections of written, spoken, multi-modal and signed text.
The project is:
- improving researchers’ digital skills and raise awareness of best practice in digital research
- rendering valuable collections of national significance more findable, accessible, interoperable and reusable (FAIR) while adhering to CARE principles
- developing the integrated national technical infrastructure to analyse language collections at scale.
It supports researchers to deliver innovative research outcomes, and opens up the social and economic possibilities of Australia’s language data for translational research in the national interest.
LDaCA:
- addresses the challenge of balancing research needs while respecting community rights for language and cultural collections
- highlights contributions that language research and HASS disciplines can make to STEM research and non-academic applications
- positions Australia internationally as a leading contributor of language collections and digital infrastructure.
LDaCA has not only built an integrated national technical infrastructure for language data, it is also contributing to the success and impact of the HASS and Indigenous RDC by creating foundational infrastructure. It is also positioning Australia internationally as a leading contributor of language collections and digital infrastructure.
The Australian Text Analytics Platform (ATAP) is also part of the Language Data Commons of Australia.
Target Outcomes
LDaCA is a sustainable long-term repository for language data collections of national significance. This has implications for the development of Australia’s economy, national security and social and cultural well-being. Visit the LDaCA website and access the LDaCA data portal.
The work of LDaCA to date has been focused on the sustainability of data as well as offering tools and training for the collection and analysis of language data. Our achievements towards this goal include:
- developing policies and governance structures for long-term data storage and access
- developing a technology stack which enables secure storage and provides a basis for tools and services now and in the future
- establishing relationships with various communities to encourage sustainable data management and data (re)use practices
- developing notebooks that enable researchers to learn how to apply text analytics to their own data or collections held in LDaCA.
To date, LDaCA has:
- given 17 conference presentations
- presented over 40 workshops, reaching nearly 1000 people
- secured 25 dataset and built 24 data migration tools
- created 75 software repositories, including some public tools, such as an RO-Crate profile, a metadata vocabulary, and a GUI tool for working with those resources, Crate-O.
- engaged with 8 Indigenous communities/organisations in the development process.
Who Will Benefit
LDaCA gives researchers more widespread access to Australia’s rich language resources, accelerating the development of language data analysis capability in Australian research and industry.
The Partners
LDaCA is part of the ARDC’s HASS and Indigenous Research Data Commons. It previously received support from the ARDC through the:
Our partners are:
- The University of Queensland (lead)
- ARDC
- Australian National University
- Monash University
- The University of Melbourne
- The University of Sydney
- AARNet
- First Languages Australia
- Australian Institute for Aboriginal and Torres Strait Islander Studies
- PARADISEC
- ARC Centre of Excellence for the Dynamics of Language
- Digital Observatory (QUT)
- CLARIN
Further Resources
- Read the report on the LDaCA event, Bringing Data to Life: Co-Designing a Language Data Commons.
- Watch the initial project plan webinar.
- Read the revised project plan .
- Read the response to project plan feedback.
- Explore UQ School of Languages co-investment projects with the ARDC.
Timeframe
Current Phase
ARDC Co-investment
Project lead
Categories
Research Topic
Related Case Studies
Related Articles
- “Bringing Data to Life: Co-Designing a Language Data Commons” Recap
- Announcing Successful Projects for the ARDC HASS Research Data Commons and Indigenous Research Capability Program
- A National Language Data Commons for Australia
- Australian Text Analytics Platform Launches
- Advancing HASS and Indigenous Research Infrastructure: A Symposium
- Empowering HASS and Indigenous Researchers with Essential Computational Skills
- Implementing Indigenous Data Licensing and Access: Empowering Communities and Upholding Cultural Rights
- Collections as Data in Australia
- Summer School Shares Computational Skills for HASS and Indigenous Research
- Draft Project Plans for the HASS and Indigenous Research Data Commons Now Open for Feedback