Text analytics is the process of enabling data-driven research by extracting and analysing machine-readable information from within unstructured text. Due to the increasing availability of large amounts of unstructured text, techniques for their analysis are becoming increasingly important across research disciplines. This can take the form of extracting social and cultural information from texts in the humanities and social sciences to extracting machine-readable information from technical texts in engineering and the sciences to assist with developing hypotheses and projections.

Text analytics in research tends to happen at either a basic, generic level (handled with standard packages) or with custom code specifically developed for a particular project. The aim of the Australian Text Analytics Platform (ATAP) is to provide researchers with a toolset that is more powerful and customisable than those contained in the standard packages, while being accessible to a large number of researchers who do not have strong coding skills.

ATAP will transform and accelerate the data-driven research possibilities across disciplines by providing Australian researchers with access to an online platform for processing and analysing unstructured texts. The platform will include self-service training in text analytics techniques and promote greater flexibility and transparency in research workflows. This project aims to foster a community that brings together developers and users of text analytics in an accessible and collaborative environment.

This project is supported through the ARDC Research Platforms program.

Start date 17 March 2021
Expected completion date 30 June 2023
Investment by ARDC $759,510
Lead node
1 Text analytics notebooks for data processing and analysis
A library of Jupyter Notebooks incorporating open source scripts for cleaning, transforming, analysing, and visualising text data. The ready-to-use notebooks will contain core functionalities that can also be further built upon and customised for more complex text analysis.
2 Text analysis workbench
The workbench is a web-based, authenticated environment that enables researchers to import into an analytics sandbox their own text datasets such as text data scraped from websites; collections of journal articles or transcripts of media files. The workbench and its support services will allow researchers to customise text analysis notebooks without needing a strong background in coding.
3 Online text analytics training environment
Web-based training in text analytics and community development initiatives (such as hacky hours and user groups) will support the needs of the community of emerging users of text analysis tools. The training environment will present a selection of case studies demonstrating the entire process of text analytics across a range of domains and applications. This will be complemented by a series of education and training workshops targeting researchers from beginner through to more experienced practitioners.

Core features

Powerful, accessible tools
Jupyter Notebooks containing ready-to-use, customisable scripts from simple processing tasks through to complex text analyses.
Online training
Enable researchers who do not know how to code to undertake text analytics. Case studies across a range of disciplines will demonstrate how to use available notebooks to produce research outputs.

Who is this project for?

  • Researchers
  • Research organisations
  • Higher degree research candidates
  • Coursework students doing text analytics based coursework

What does this project enable?

ATAP will support FAIR data management principles through the creation of tools that automate the creation of text analytics data output that is transparent and replicable. Exploring the large or complex datasets used for text analytics would otherwise require the use of high performance computing resources. ATAP will make this possible in a web-based analysis environment with easy access to coding tools and training resources that enable individuals to complete complex text analyses.

The University of QueenslandVisit
The University of SydneyVisit