Storing Metadata
Metadata storage ensures that the metadata associated with research data collections are properly managed so that they can be harvested and exposed to search engines, as well as made available to researchers and research administrators. Metadata stores are a key component of this infrastructure.
Types of Metadata Stores
Metadata stores can be distinguished by their coverage, the granularity of data that they describe and the specialisation of their descriptions. In some cases the function is embedded in the data storage solution (for example in the repository software) and in other cases it can be separate.
Based on coverage, types of stores include:
- local metadata stores (such as for a research group)
- institutional metadata stores (for data produced across an institution)
- national metadata stores (like the ARDC’s Research Data Australia)
- discipline-specific metadata stores (collating data produced within a discipline).
Metadata about research collections is best created and managed in local metadata stores, so that it is tightly integrated with research groups and their activities. This metadata should be easily accessible and relevant to researcher needs.
Metadata stores with broader coverage are essential for data collections to be discovered, tracked and used outside the immediate context of the research.
Based on granularity (the level of detail in a dataset), types of stores include:
- collection-level metadata stores (describing collections and datasets)
- object-level metadata stores (describing individual data objects like files, database rows, spreadsheets or physical objects)
- integrated metadata stores (combining both of the above).
Institutions have different needs and approaches to storing metadata. There’s no single solution that fits all, but often existing solutions can be used or adapted.
Local Metadata Stores
Local metadata stores are crucial to good research data management and populating metadata stores at a higher level. Researchers should consider the following requirements for their local stores:
- store metadata that has keywords and supports discovery and evaluation of data
- store metadata in a format which is commonly used in the discipline
- store metadata that supports reuse of data and has clear information on areas like access rights
- export metadata to other formats commonly used in describing metadata, especially as used by metadata aggregators
- support aggregation of metadata (harvesting and/or syndication) to discovery services like Research Data Australia and Google Dataset Search
- support automated gathering of metadata from instruments and related metadata from other databases (like HR systems, Data Management Planning tools and grants programs)
- integrate into researcher workflows with minimal disruption
- allow error checking, validation, and use of controlled vocabularies
- allow metadata describing collections and objects within collections
- allow hierarchical organisation of metadata (if needed), such as ordering metadata by project or experiment.
Not all metadata store solutions will satisfy all requirements, so prioritise the features that are most critical to meet your needs. The highest priorities are likely to be commonly used formats, hierarchical organisation and aggregation support.
Solution integration
Descriptions of data collections should not be seen as information islands. They need to be connected to other kinds of information, which may be stored and managed in different data stores. For example, the authoritative source of truth for information about people can be HR and research office systems. A metadata store should be reusing that metadata, rather than creating its own records. A characteristic of high-quality metadata is that it is created once and then reused as needed. It’s also helpful to use common descriptions of grants, projects or researchers to allow users to navigate between data collections held by different institutions.
Software solutions
Different institution‐wide solutions for the discovery and reuse of research data collections are available, including: