Large collections of language data have been amassed in Australia but many remain under-utilised or at risk.
Establishing a Language Data Commons of Australia (LDaCA) will federate these efforts into nationally integrated research infrastructure for collections of high strategic importance for the Australian research community, and for translational research related to the national interest. These collections include intangible cultural heritage of the languages of some of the world’s longest continuous cultures in one of the world’s most linguistically diverse regions (Australian Indigenous languages and regional languages of the Pacific), and data which is important for cyber-security (AusTalk, Australian National Corpus, corpora of regional languages), for gauging popular opinions and sentiment (Australian Twitter Corpus), and for emergency communication (languages of the region and some Indigenous languages). The Language Data Commons of Australia will be a sustainable long-term repository for ingesting and curating existing language data collections of national significance.
Who is this project for?
- Research organisations
What does this project enable?
The establishment of LDaCA will enable more widespread access to Australia’s rich language resources and accelerate the development of language data analysis capability in Australian research and industry through the development of impact pathways. This has implications for the development of Australia’s economy, national security and social and cultural well-being.
- Australian National Corpus
- Project id: https://doi.org/10.47486/DP768
- Project stakeholders: