Our current working groups are listed below.
Displaying 6 results.
Modern Earth sciences produce a continuous increasing amount of data. These data consist of the measurements/observations and descriptive information (metadata) and include semantic classifications (semantics). Depending on the geoscientific parameter, metadata are stored in a variety of different databases, standards and semantics, which is obstructive for interoperability in terms of limited data access and exchange, searchability and comparability. Examples of common data types with very different structure and metadata needs are maps, geochemical data derived from field samples, or time series data measured with a sensor at a point, such as precipitation or soil moisture.
So far, there is a large gap between the capabilities of databases to capture metadata and their practical use. ALAMEDA is designed as modular structured metadata management platform for curation, compilation, administration, visualization, storage and sharing of meta information of lab-, field- and modelling datasets. As a pilot application for stable isotope and soil moisture data ALAMEDA will enable to search, access and compare meta information across organization-, system- and domain boundaries.
ALAMEDA covers 5 major categories: observation & measurements, sample & data history, sensor & devices, methods & processing, environmental characteristics (spatio & temporal). These categories are hierarchically structured, interlinkable and filled with specific metadata attributes (e.g. name, data, location, methods for sample preparation, measuring and data processing, etc.). For the pilot, all meta information will be provided by existing and wellestablished data management tools (e.g. mDIS, Medusa, etc.).
In ALAMEDA, all information is brought together and will be available via web interfaces. Furthermore, the project focuses on features such as metadata curation with intuitive graphical user interfaces, the adoption of well-established standards, the use of domain-controlled vocabularies and the provision of interfaces for a standards-based dissemination of aggregated information. Finally, ALAMEDA should be integrated into the DataHub (Hub-Terra).
Metadata can be recorded, stored, and published very efficiently by the use of electronic lab notebooks (ELNs) being a key prerequisite for a comprehensive documentation of research processes. However, the interdisciplinarity of modern research groups creates the necessity to use different ELNs or to use data interoperably in different ELNs. Despite manifold ELNs on the market, an interface between ELNs remains yet not achieved due to missing (metadata) standards but also missing collaboration efforts.
In this project, an interface for metadata transfer between the open-source ELNs Chemotion (development lead at KIT) and Herbie (development at Hereon) will be developed. The implemented methods will aim for a generalization of the process via a guideline and will include available standards. The development will be demonstrated for the use case of polymer membrane research. This tool will improve the interoperability and reusability of metadata and thus support all ELN users relying on Chemotion and Herbie by enriching their datasets with data from the complementary ELN. The project will aim for a generalization of the specific process via the description of a general guideline and the implementation of available standards.
The developments will generate a direct benefit for the growing community of scientists using the two selected ELNs as the ELN extensions and its metadata schemas will be adapted interactively with the scientific community.
Project Partners: AWI, GEOMAR
Biomolecular data, e.g. DNA and RNA sequences, provides insights into the structure and functioning of marine communities in space and time. The associated metadata has great internal diversity and complexity, and to date biomolecular (meta)data management is not well integrated and harmonised across environmentally focused Helmholtz Centers.
As part of the HMC Project HARMONise, we aim to develop sustainable solutions and digital cultures to enable high-quality, standards-compliant curation and management of marine biomolecular metadata at AWI and GEOMAR, to better embed biomolecular science into broader digital ecosystems and research domains. Our approach builds on a relational database that aligns metadata with community standards such as the Minimum Information about any (x) sequence (MIxS) supported by the International Nucleotide Sequence Database Collaboration (INSDC), and with associated ontology content (e.g. The Environment Ontology - ENVO).
At the same time, we ensure the harmonization of metadata with existing Helmholtz repositories (e.g. PANGAEA). A web-portal for metadata upload and harvest will enable sustainable data stewardship and support researchers in delivering high-quality metadata to national and global repositories, and improve accessibility of the metadata.
Metadata subsets will be harvested by the Marine Data Portal ( https://marine-data.de ), increasing findability across research domains and promoting reuse of biomolecular research data. Alignment of the recorded metadata with community standards and relevant data exchange formats will support Helmholtz and global interoperability .
The importance of metadata when sharing FAIR data cannot be overstated. The MetaCook project was initially envisioned as a cookbook-like set of instructions for developing high-quality vocabularies and subsequently metadata. However, we decided to turn the cookbook into an interactive software called VocPopuli, which includes the collaborative development of controlled vocabularies. In this way, instead of reading lists of instructions, users from any background and level of experience can easily navigate VocPopuli and receive interactive guidance. Using VocPopuli ensures that the developed vocabularies will be FAIR themselves.
VocPopuli offers the capability to immediately apply the vocabulary to datasets stored in electronic lab notebooks. As an example, an integration with Kadi4Mat is already established, and we are currently implementing the same for Herbie.
Functionally, VocPopuli has a few main features:
-
GitLab login is used to associate user contributions with term versions and vocabularies
-
Each term’s provenance can be tracked both via VocPopuli’s user interface and GitLab’s commit history and branches
-
Every term contains: label, synonyms, translations, expected data type, broader terms, related internal/external terms, up/downvotes, collaborative discussion board, visual history graph
-
Each vocabulary contains: metadata about its contents, a hierarchical structure of terms, a set of allowed named relationships, a git repository
-
In the backend VocPopuli’s data is stored in a graph database
-
SKOS and PROV (in progress) are optional exports
-
Any resource can be digitalized: Lab Procedure; Lab Specimen, ELN, ELN Export, Data Analysis
FAIR vocabularies hold enough information which enables their transformation into ontologies. We verify this with a prototype of a second piece of software called OntoFAIRCook. At the time of writing, OntoFAIRCook is available as a command line tool, and is being transformed into a web interface which is easy to use.
This data project contains data and software with regard to Metamorphoses , a joint project in the framework of Helmholtz Metadata Collaboration (HMC) with contributions from KIT-IMK and FZJ-IEK7.
Currently the amount and diversity of high-quality satellite-based atmospheric observations is quickly increasing, and their synergetic use offers unprecedented knowledge gaining opportunities. However, for this kind of interoperability and reusability of the remote sensing data, the storage intensive averaging kernels and errorcovariances are needed for each individual observation.
This project will develop enhanced standards for storage efficient decomposed arrays thus enabling the advanced reuse of very large remote sensing data sets. The synergetic data merging will be further supported by Lagrange trajectory metadata. For this purpose, the project will develop tools for an automated generation of standardised trajectory data files.
A case study will demonstrate the impact on science of the multi-sensor atmospheric observational data generated with the support of Lagrange trajectory calculations. The project will actively support data merging activities in other research fields.
The Sample Environment Communication Protocol (SECoP) provides a generalized way for controlling measurement equipment – with a special focus on sample environment (SE) equipment. In addition, SECoP holds the possibility to transport SE metadata in a well-defined way.
Within the project SECoP@HMC, we are developing and implementing metadata standards for typical SE equipment at large scale facilities (photons, neutrons, high magnetic fields). A second focus is the mapping of the SECoP metadata standards to a unified SE vocabulary for a standardized metadata storage. Thus, a complete standardized system for controlling SE equipment and collecting and saving SE metadata will be available and usable in the experimental control systems (ECS) of the participating facilities. This approach can be applied to other research areas as well.
The project SECoP@HMC is organised in four work packages
-
Standards for Sample Environment metadata in SECoP (WP1)
-
Standards for storage of Sample Environment metadata (WP2)
-
Implementation into experimental control systems (WP3)
-
Outreach, Dissemination & Training (WP4)
The objectives of WP1 and WP2 are to standardize the provision and storage of metadata for SE equipment for FAIR compatible reuse and interoperability of the data. WP3 establishes SECoP as a common standard for SE communication at the involved centers by integrating the protocol into experiment control systems, easing the integration of new and user-built SE equipment into experiments providing sufficient SE metadata. In WP4 reach out to the metadata and experimental controls community, e.g. organising workshops and presenting SECoP@HMC at conferences (see e.g. figure 1).
Some background on SECoP:
SECoP is developed in cooperation with the International Society for Sample Environment (ISSE) as an international standard for the communication between SE equipment and ECS. It is intended to ease the integration of sample environment equipment supplied by external research groups and by industrial manufacturers.
-
SECoP is designed to be
-
simple
-
inclusive
-
self-explaining
-
providing metadata
Inclusive means, that different facilities can use this protocol and don't have to change their work flow, e.g. rewrite drivers completely or organize and handle hardware in a specific way to fulfil SECoP requirements. Simple means it should be easy to integrate and to use – even for the non-expert programmer. Self-explaining means that SECoP provides a complete human- and machine-readable description of the whole experimental equipment, including how to control it and what the equipment is representing. In respect to metadata, SECoP greatly facilitates and structures the provision of metadata which is associated with SE equipment.
Several implementations of SECoP are developed and support the design of SECoP-compatible sample environment control software. The complete specifications of SECoP are available on GitHub.