Large collections of language data have been amassed in Australia but many remain under-utilised or at risk.

Establishing a Language Data Commons of Australia (LDaCA) will federate these efforts into nationally integrated research infrastructure for collections of high strategic importance for the Australian research community, and for translational research related to the national interest. These collections include intangible cultural heritage of the languages of some of the world’s longest continuous cultures in one of the world’s most linguistically diverse regions (Australian Indigenous languages and regional languages of the Pacific), and data which is important for cyber-security (AusTalk, Australian National Corpus, corpora of regional languages), for gauging popular opinions and sentiment (Australian Twitter Corpus), and for emergency communication (languages of the region and some Indigenous languages). The Language Data Commons of Australia will be a sustainable long-term repository for ingesting and curating existing language data collections of national significance.

Start date 1 January 2021
Expected completion date 30 June 2023
Investment by ARDC $500,000
Lead node
1 Language Data Access Policy Framework
A policy framework for culturally, ethically and legally appropriate access to language data will be developed.
2 Language data standards
Language data standards will be developed and shared.
3 National language data portal
A common data portal for accessing, aggregating and harvesting language data will be established.
4 Outreach program
An outreach and training program will be developed for researchers.

Core features

Data access
A dedicated portal for discovery and access of the national language data asset will be developed, supported by a comprehensive data access policy.
Shared technical infrastructure and language data standards will be developed and shared across institutions.

Who is this project for?

  • Research organisations
  • Government

What does this project enable?

The establishment of LDaCA will enable more widespread access to Australia’s rich language resources and accelerate the development of language data analysis capability in Australian research and industry through the development of impact pathways. This has implications for the development of Australia’s economy, national security and social and cultural well-being.

Monash UniversityVisit
University of MelbourneVisit
University of QueenslandVisit