Australian Text Analytics Platform (ATAP)

A powerful toolset for processing and analysing unstructured texts.
Text Analytics Platform,Text Analytics,ATAP,Australian Text Analytics Platform
Project
Australian Text Analytics Platform (ATAP)
Project lead
The University of Queensland
Who will benefit
Researchers, research organisations, higher-degree research candidates, text analytics coursework students

Timeframe

March 2021 to June 2023

Current Phase

In progress

ARDC Co-investment

$759,510

The Challenge

Text analytics is the process of enabling data-driven research by extracting and analysing machine-readable information from within unstructured text. 

Due to the increasing availability of large amounts of unstructured text, techniques for their analysis are becoming increasingly important across research disciplines. 

This can take the form of extracting social and cultural information from texts in the humanities and social sciences (HASS) to extracting machine-readable information from technical texts in engineering and the sciences to help with developing hypotheses and projections.

The Response

Text analytics in research tends to happen at either a basic, generic level (handled with standard packages) or with custom code specifically developed for a particular project. 

The Australian Text Analytics Platform (ATAP) will provide researchers with a toolset that is more powerful and customisable than those contained in the standard packages, while being accessible to a large number of researchers who do not have strong coding skills.

ATAP will transform and accelerate the data-driven research possibilities across disciplines by providing Australian researchers with access to an online platform for processing and analysing unstructured texts. 

The platform will include self-service training in text analytics techniques and promote greater flexibility and transparency in research workflows. This project aims to foster a community that brings together developers and users of text analytics in an accessible and collaborative environment.

This project involves the following elements:

  • Text analytics notebooks for data processing and analysis – A library of Jupyter Notebooks incorporating open source scripts for cleaning, transforming, analysing, and visualising text data. The ready-to-use notebooks will contain core functionalities that can also be further built upon and customised for more complex text analysis.
  • Text analysis workbench – The workbench is a web-based, authenticated environment that enables researchers to import into an analytics sandbox their own text datasets such as text data scraped from websites; collections of journal articles or transcripts of media files. The workbench and its support services will allow researchers to customise text analysis notebooks without needing a strong background in coding.
  • Online text analytics training environment – Web-based training in text analytics and community development initiatives (such as hacky hours and user groups) will support the needs of the community of emerging users of text analysis tools. The training environment will present a selection of case studies demonstrating the entire process of text analytics across a range of domains and applications. This will be complemented by a series of education and training workshops targeting researchers from beginner through to more experienced practitioners.

Who Will Benefit

Researchers, research organisations, higher-degree research candidates text analytics coursework students will benefit from the project’s core features:

  • Powerful, accessible tools – Jupyter Notebooks containing ready-to-use, customisable scripts from simple processing tasks through to complex text analyses.
  • Online training – Enable researchers who do not know how to code to undertake text analytics. Case studies across a range of disciplines will demonstrate how to use available notebooks to produce research outputs.

The Partners

Our partners are:

Target Outcomes

ATAP will support FAIR data management principles through the creation of tools that automate the creation of text analytics data output that is transparent and replicable. 

Exploring the large or complex datasets used for text analytics would otherwise require the use of high performance computing resources. 

ATAP will make this possible in a web-based analysis environment with easy access to coding tools and training resources that enable individuals to complete complex text analyses.

Contact the ARDC

"*" indicates required fields