China, curve, deaths, distancing, economic, lockdown, quarantine, schools, spread… These were the top words mentioned on Twitter in Australia about the pandemic from January to April 2020.
Following the declaration of the COVID-19 pandemic, researchers began to study what people were sharing on Twitter to gain insight into society’s response to a global disaster, one that is still unfolding today.
Software engineers were mining the text to look for trends on social media, but their analyses lacked nuance in determining what conversations were dominant at different stages of the pandemic.
Enter the linguists. Dr Martin Schweinberger is a Lecturer in Applied Linguistics at the University of Queensland and Director of the Language Technology and Data Analysis Laboratory (LADAL).
“As linguists, we know that you cannot view language as one big lump of words,” he said. “The discourse on COVID-19 in Australia evolved over time through different topics and different layers of discussion. And when you want to have something meaningful come out of the analysis, you need to separate these layers.”
Analysing Unstructured Text
To unravel the early discourse, Dr Schweinberger and Dr Sam Hames, a Postdoctoral Research Fellow in Computational Humanities at the University of Queensland, used the Australian Text Analytics Platform (ATAP), which received ARDC co-investment, to analyse over 41,000 COVID-related tweets posted between January and April 2020. The Twitter dataset was obtained from another ARDC co-investment project, the Australian Digital Observatory.
The linguistically informed text analysis showed the dominant words shared on Twitter in the early stages of the pandemic. “Discourse began by focusing on China and the coronavirus, and later Australians were more concerned about toilet paper, lockdown, casual contacts, jobkeeper, and school closures,” said Dr Schweinberger.
The team also identified 5 main topics of discussion on Twitter: medical, international, restrictions/home, spread, and economy.
“By combining linguistics and text mining on a large dataset, we enhanced understanding of the public’s response to social events in a way that is not possible when relying on only engineering approaches,” said Dr Schweinberger.
It’s not only linguistics that’s harnessing large datasets to gain insights on Australian society. More and more, researchers from the humanities, arts and social sciences (HASS) are using computational approaches to understand society and culture. The ARDC’s HASS and Indigenous Research Data Commons is creating the infrastructure needed to support them in taking data-driven approaches.
Tools and Training in Text Analysis for Australian Researchers
ATAP, which is part of the Language Data Commons of Australia, is also one of the projects within the HASS and Indigenous Research Data Commons. It’s an open-source platform with tools and training for researchers to analyse, process and explore text. Australian researchers can use it to access an ecosystem of data and code repositories, online workspaces, scripts, and training in text analytics.
Using text analytics, researchers can extract and analyse information from unstructured text, enabling data-driven research. Due to the ever-increasing availability of large amounts of unstructured text, not only from Twitter but other digital media platforms, such techniques are becoming more and more important across diverse research disciplines.
ATAP can also work with existing archives, such as the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) and the Australian National Corpus, making it easier for researchers to access the content in their collections.
A powerful toolbox within ATAP – the Language Technology and Data Analysis Laboratory (LADAL) – offers a multitude of resources on methods used to analyse textual data. It includes basic research showcases, self-paced tutorials for everyone from novices to experts, and readymade interactive notebooks that allow users to try out a method, also with their own data.
ATAP is a technical platform but training is an important component – digital research methods are vital skills for early-career researchers in HASS and Indigenous studies. In 2022, the ATAP team trained over 400 researchers through hands-on workshops, online training modules and online office hours, and advised and collaborated with partners.
Learn more about ATAP.
The Australian Text Analytics Platform project received investment (doi.org/10.47486/PL074) from the ARDC. It is led by The University of Queensland, with support from AARNet and The University of Sydney.
Case study written by Jo Savill, ARDC. Edited by Mary O’Callaghan. Reviewed by Dr Martin Schweinberger and Dr Simon Musgrave (University of Queensland) and Keith Russell (ARDC).