Australia is a massively multilingual country, in one of the world’s most linguistically diverse regions. Significant collections of this intangible cultural heritage have been amassed, including collections of Australian Indigenous languages, regional languages of the Pacific, and Australian English.
There are also language collections important for cybersecurity (AusTalk, Australian National Corpus, corpora of regional languages), for gauging popular sentiment (Australian Twitter Corpus), and for emergency communication (languages of the region and some Indigenous languages).
However, much of Australia’s language data is scattered, hard to find, and in danger of being lost. Many collections remain under-used and researchers lack the tools and skills to exploit their research potential.
We’re establishing the Language Data Commons of Australia (LDaCA), an integrated national infrastructure that supports language research. It will enable researchers and communities to access and use nationally significant collections of written, spoken, multi-modal and signed text.
The project will:
- improve researchers’ digital skills and raise awareness of best practice in digital research
- render valuable collections of national significance more findable, accessible, interoperable and reusable (FAIR) while adhering to CARE principles
- develop the integrated national technical infrastructure to analyse language collections at scale.
It will support researchers to deliver innovative research outcomes, and will open up the social and economic possibilities of Australia’s language data for translational research in the national interest.
- address the challenge of balancing research needs while respecting community rights for language and cultural collections
- highlight contributions that language research and HASS disciplines can make to STEM research and non-academic applications
- position Australia internationally as a leading contributor of language collections and digital infrastructure.
Who Will Benefit
Establishing the LDaCA will give researchers more widespread access to Australia’s rich language resources, accelerating the development of language data analysis capability in Australian research and industry.
The LDaCA is supported by 3 ARDC programs:
Our partners are:
- The University of Queensland (lead)
- Australian National University
- Monash University
- The University of Melbourne
- The University of Sydney
- First Languages Australia
- Australian Institute for Aboriginal and Torres Strait Islander Studies
- ARC Centre of Excellence for the Dynamics of Language
- Digital Observatory (QUT)
The LDaCA will be a sustainable long-term repository for language data collections of national significance. This has implications for the development of Australia’s economy, national security and social and cultural well-being.
- Read the report on the LDaCA event, Bringing Data to Life: Co-Designing a Language Data Commons.
- Watch the initial project plan webinar
- Read the revised project plan
- Read the response to project plan feedback
- View the Language Data Commons of Australia (LDACA) website
- Explore UQ School of Languages co-investment projects with the ARDC.
Contact the ARDC
Related Case Studies
- “Bringing Data to Life: Co-Designing a Language Data Commons” Recap
- Announcing Successful Projects for the ARDC HASS Research Data Commons and Indigenous Research Capability Program
- A National Language Data Commons for Australia
- Australian Text Analytics Platform Launches
- Advancing HASS and Indigenous Research Infrastructure: A Symposium
- Empowering HASS and Indigenous Researchers with Essential Computational Skills
- Implementing Indigenous Data Licensing and Access: Empowering Communities and Upholding Cultural Rights
- Collections as Data in Australia