Language Data Commons of Australia (LDaCA)

Who will benefit

HASS and Indigenous research community

DOI

https://doi.org/10.47486/HIR001

Program

HASS and Indigenous Research Data Commons

Thematic research data commons is: HASS and Indigenous

Explore

The Challenge

Australia is a massively multilingual country, in one of the world’s most linguistically diverse regions. Significant collections of this intangible cultural heritage have been amassed, including collections of Australian Indigenous languages, regional languages of the Pacific, and Australian English.

There are also language collections important for cybersecurity (AusTalk, Australian National Corpus, corpora of regional languages), for gauging popular sentiment (Australian Twitter Corpus), and for emergency communication (languages of the region and some Indigenous languages).

However, much of Australia’s language data is scattered, hard to find, and in danger of being lost. Many collections remain under-used and researchers lack the tools and skills to exploit their research potential.

The Response

We’ve established the Language Data Commons of Australia (LDaCA), an integrated national infrastructure that supports language research. It enables researchers and communities to access and use nationally significant collections of written, spoken, multi-modal and signed text.

The project is:

improving researchers’ digital skills and raise awareness of best practice in digital research
rendering valuable collections of national significance more findable, accessible, interoperable and reusable (FAIR) while adhering to CARE principles
developing the integrated national technical infrastructure to analyse language collections at scale.

It supports researchers to deliver innovative research outcomes, and opens up the social and economic possibilities of Australia’s language data for translational research in the national interest.

LDaCA:

addresses the challenge of balancing research needs while respecting community rights for language and cultural collections
highlights contributions that language research and HASS disciplines can make to STEM research and non-academic applications
positions Australia internationally as a leading contributor of language collections and digital infrastructure.

LDaCA has not only built an integrated national technical infrastructure for language data, it is also contributing to the success and impact of the HASS and Indigenous RDC by creating foundational infrastructure. It is also positioning Australia internationally as a leading contributor of language collections and digital infrastructure.

The Australian Text Analytics Platform (ATAP) is also part of the Language Data Commons of Australia.

Target Outcomes

LDaCA is a sustainable long-term repository for language data collections of national significance. This has implications for the development of Australia’s economy, national security and social and cultural well-being. Visit the LDaCA website and access the LDaCA data portal.

The work of LDaCA to date has been focused on the sustainability of data as well as offering tools and training for the collection and analysis of language data. Our achievements towards this goal include:

developing policies and governance structures for long-term data storage and access
developing a technology stack which enables secure storage and provides a basis for tools and services now and in the future
establishing relationships with various communities to encourage sustainable data management and data (re)use practices
developing notebooks that enable researchers to learn how to apply text analytics to their own data or collections held in LDaCA.

To date, LDaCA has:

given 17 conference presentations
presented over 40 workshops, reaching nearly 1000 people
secured 25 dataset and built 24 data migration tools
created 75 software repositories, including some public tools, such as an RO-Crate profile, a metadata vocabulary, and a GUI tool for working with those resources, Crate-O.
engaged with 8 Indigenous communities/organisations in the development process.

Who Will Benefit

LDaCA gives researchers more widespread access to Australia’s rich language resources, accelerating the development of language data analysis capability in Australian research and industry.

The Partners

LDaCA is part of the ARDC’s HASS and Indigenous Research Data Commons. It previously received support from the ARDC through the:

Our partners are:

The University of Queensland (lead)
ARDC
Australian National University
Monash University
The University of Melbourne
The University of Sydney
AARNet
First Languages Australia
Australian Institute for Aboriginal and Torres Strait Islander Studies
PARADISEC
ARC Centre of Excellence for the Dynamics of Language
Digital Observatory (QUT)
CLARIN

Further Resources

Read the report on the LDaCA event, Bringing Data to Life: Co-Designing a Language Data Commons.
Watch the initial project plan webinar.
Read the revised project plan .
Read the response to project plan feedback.
Explore UQ School of Languages co-investment projects with the ARDC.

Register Your Interest in the HASS and Indigenous Research Data Commons

"*" indicates required fields

Name*

Email*

Organisation*

Please specify organisation not listed above*

Role*

Newsletter Subscribe

I would like to stay up to date on digital research news, events, jobs, guides and more by subscribing to the ARDC newsletter.

Phone

This field is for validation purposes and should be left unchanged.

Timeframe

Ongoing

Current Phase

In progress

ARDC Co-investment

$3,794,101

Project lead

Professor Michael Haugh, School of Languages and Cultures, The University of Queensland

Research Topic

Humanities, Arts and Social Sciences (HASS), Indigenous Studies

Related Case Studies

Banduk Marika and Ernie Dingo on a beach

Related Projects

A person wearing a heavy-duty mask with a bushfire looming

Aggregating and Integrating Data on Health Outcomes Associated with Bushfires at a National Scale

Exploreabout Aggregating and Integrating Data on Health Outcomes Associated with Bushfires at a National Scale

Xanthorrhoea grass trees resprouting after a bushfire - ash on the forest floor, black grass tree stumps with green grass spouting from their tops.

Bushfire Research Data Management Plans

Exploreabout Bushfire Research Data Management Plans

A board showing the fire danger rating of the day with a fire behind it

Aggregated and Harmonised Fuel Data on a National Scale

Exploreabout Aggregated and Harmonised Fuel Data on a National Scale

a rural fire brigade firefighter standing in a burnt forest with a small fire burning beyond him. Image - Stuart - 507395677 / AdobeStock.com

Framework for Sharing Bushfire Data and Tools Between Jurisdictional Agencies

Exploreabout Framework for Sharing Bushfire Data and Tools Between Jurisdictional Agencies

Search all resources

Curated collections