The Challenge
Text analytics is the process of enabling data-driven research by extracting and analysing machine-readable information from within unstructured text.
Due to the increasing availability of large amounts of unstructured text, techniques for their analysis are becoming increasingly important across research disciplines.
This can take the form of extracting social and cultural information from texts in the humanities, arts and social sciences (HASS) to extracting machine-readable information from technical texts in engineering and the sciences to help with developing hypotheses and projections.
The Response
Text analytics in research tends to happen at either a basic, generic level (handled with standard packages) or with custom code specifically developed for a particular project.
The Australian Text Analytics Platform (ATAP) provides researchers with a toolset that is more powerful and customisable than those contained in the standard packages, while being accessible to a large number of researchers who do not have strong coding skills.
ATAP is transforming and accelerating the data-driven research possibilities across disciplines by providing Australian researchers with access to an online platform for processing and analysing unstructured texts.
The platform includes self-service training in text analytics techniques and promotes greater flexibility and transparency in research workflows. This project aims to foster a community that brings together developers and users of text analytics in an accessible and collaborative environment.
This project involves the following elements:
Text analytics notebooks for data processing and analysis
A library of Jupyter Notebooks incorporates open-source scripts for cleaning, transforming, analysing and visualising text data. The ready-to-use notebooks contain core functionalities that can also be further built upon and customised for more complex text analysis.
Text analysis workbench
The workbench is a web-based, authenticated environment that enables researchers to import into an analytics sandbox their own text datasets such as text data scraped from websites, and collections of journal articles or transcripts of media files. The workbench and its support services allow researchers to customise text analysis notebooks without needing a strong background in coding.
Online text analytics training environment
Web-based training in text analytics and community development initiatives (such as hacky hours and user groups) will support the needs of the community of emerging users of text analysis tools. The training environment presents a selection of case studies demonstrating the entire process of text analytics across a range of domains and applications. This is complemented by a series of education and training workshops targeting researchers from beginners through to more experienced practitioners.
The Outcomes
Access the Australian Text Analytics Platform (ATAP).
ATAP supports FAIR data management principles through the creation of tools that automate the creation of text analytics data output that is transparent and replicable.
Exploring the large or complex datasets used for text analytics would otherwise require the use of high-performance computing resources. ATAP makes this possible in a web-based analysis environment with easy access to coding tools and training resources that enable individuals to complete complex text analyses.
Who Will Benefit
Researchers, research organisations, higher-degree research (HDR) candidates and text analytics coursework students will benefit from the project’s core features: