Language Data Commons of Australia

Rescuing vulnerable language collections.
repository for language data collections,Language Data Commons of Australia
Project
Language Data Commons of Australia
Project lead
Professor Michael Haugh, School of Languages and Cultures, The University of Queensland
Who will benefit
HASS and Indigenous researchers

Timeframe

November 2022 to June 2023

Current Phase

In progress

ARDC Co-investment

$1,933,000

The Challenge

Australia is a massively multilingual country, in one of the world’s most linguistically diverse regions. Significant collections of this intangible cultural heritage have been amassed, including collections of Australian Indigenous languages, regional languages of the Pacific, and Australian English.

There are also language collections important for cybersecurity (AusTalk, Australian National Corpus, corpora of regional languages), for gauging popular sentiment (Australian Twitter Corpus), and for emergency communication (languages of the region and some Indigenous languages).

However, much of Australia’s language data is scattered, hard to find, and in danger of being lost. Many collections remain under-used and researchers lack the tools and skills to exploit their research potential.

The Response

We’re establishing the Language Data Commons of Australia (LDaCA), an integrated national infrastructure that supports language research. It will enable researchers and communities to access and use nationally significant collections of written, spoken, multi-modal and signed text.

The project will:

  • improve researchers’ digital skills and raise awareness of best practice in digital research
  • render valuable collections of national significance more findable, accessible, interoperable and reusable (FAIR) while adhering to CARE principles
  • develop the integrated national technical infrastructure to analyse language collections at scale.

It will support researchers to deliver innovative research outcomes, and will open up the social and economic possibilities of Australia’s language data for translational research in the national interest.

We will:

  • address the challenge of balancing research needs while respecting community rights for language and cultural collections
  • highlight contributions that language research and HASS disciplines can make to STEM research and non-academic applications
  • position Australia internationally as a leading contributor of language collections and digital infrastructure.

Who Will Benefit

Establishing the LDaCA will give researchers more widespread access to Australia’s rich language resources, accelerating the development of language data analysis capability in Australian research and industry.

The Partners

The LDaCA is supported by 3 ARDC programs:

Our partners are:

  • The University of Queensland (lead)
  • Australian National University
  • Monash University
  • The University of Melbourne
  • The University of Sydney
  • AARNet
  • First Languages Australia
  • Australian Institute for Aboriginal and Torres Strait Islander Studies
  • PARADISEC
  • ARC Centre of Excellence for the Dynamics of Language
  • Digital Observatory (QUT)
  • CLARIN

Target Outcomes

The LDaCA will be a sustainable long-term repository for language data collections of national significance. This has implications for the development of Australia’s economy, national security and social and cultural well-being.

Key Resources

Contact the ARDC

"*" indicates required fields