Projects

Filters
Clear all

Active topic Project Cohort 2021

Helmholtz-Zentrum Potsdam, Deutsches GeoForschungsZentrum GFZ (GFZ)

Oliver Rach

ALAMEDA proposes a standards-based metadata management system that prioritizes the end user perspective. This will allow access, search and compare meta-information across databases with an automated metadata enrichment utilizing available DataHub tools, implemented exemplary for soil moisture and stable isotope geochemistry.

Modern Earth sciences produce a continuous increasing amount of data. These data consist of the measurements/observations and descriptive information (metadata) and include semantic classifications (semantics). Depending on the geoscientific parameter, metadata are stored in a variety of different databases, standards and semantics, which is obstructive for interoperability in terms of limited data access and exchange, searchability and comparability. Examples of common data types with very different structure and metadata needs are maps, geochemical data derived from field samples, or time series data measured with a sensor at a point, such as precipitation or soil moisture.

So far, there is a large gap between the capabilities of databases to capture metadata and their practical use. ALAMEDA is designed as modular structured metadata management platform for curation, compilation, administration, visualization, storage and sharing of meta information of lab-, field- and modelling datasets. As a pilot application for stable isotope and soil moisture data ALAMEDA will enable to search, access and compare meta information across organization-, system- and domain boundaries.

ALAMEDA covers 5 major categories: observation & measurements, sample & data history, sensor & devices, methods & processing, environmental characteristics (spatio & temporal). These categories are hierarchically structured, interlinkable and filled with specific metadata attributes (e.g. name, data, location, methods for sample preparation, measuring and data processing, etc.). For the pilot, all meta information will be provided by existing and wellestablished data management tools (e.g. mDIS, Medusa, etc.).

In ALAMEDA, all information is brought together and will be available via web interfaces. Furthermore, the project focuses on features such as metadata curation with intuitive graphical user interfaces, the adoption of well-established standards, the use of domain-controlled vocabularies and the provision of interfaces for a standards-based dissemination of aggregated information. Finally, ALAMEDA should be integrated into the DataHub (Hub-Terra).

Project Cohort 2021

View details

Karlsruhe Institute of Technology (KIT)

External website View details

View details

Nicole Jung

In this project, a method for metadata and data transfer between the open-source ELNs Chemotion and Herbie will be developed. The project will aim for a generalization of the specific process via the description of a general guideline and the implementation of available standards. The concept will be demonstrated for the use case of polymer membrane research.

Metadata can be recorded, stored, and published very efficiently by the use of electronic lab notebooks (ELNs) being a key prerequisite for a comprehensive documentation of research processes. However, the interdisciplinarity of modern research groups creates the necessity to use different ELNs or to use data interoperably in different ELNs. Despite manifold ELNs on the market, an interface between ELNs remains yet not achieved due to missing (metadata) standards but also missing collaboration efforts.

In this project, an interface for metadata transfer between the open-source ELNs Chemotion (development lead at KIT) and Herbie (development at Hereon) will be developed. The implemented methods will aim for a generalization of the process via a guideline and will include available standards. The development will be demonstrated for the use case of polymer membrane research. This tool will improve the interoperability and reusability of metadata and thus support all ELN users relying on Chemotion and Herbie by enriching their datasets with data from the complementary ELN. The project will aim for a generalization of the specific process via the description of a general guideline and the implementation of available standards.

The developments will generate a direct benefit for the growing community of scientists using the two selected ELNs as the ELN extensions and its metadata schemas will be adapted interactively with the scientific community.

Project Cohort 2021

View details

Alfred-Wegener-Institut - Helmholtz-Zentrum für Polar- und Meeresforschung (AWI)

External website View details

View details

Christina Bienhold

This collaborative project will develop sustainable solutions and digital cultures to enable high-quality, standards-compliant curation and management of marine biomolecular metadata to better embed biomolecular science in broader digital ecosystems and research domains.

Project Partners: AWI, GEOMAR

Biomolecular data, e.g. DNA and RNA sequences, provides insights into the structure and functioning of marine communities in space and time. The associated metadata has great internal diversity and complexity, and to date biomolecular (meta)data management is not well integrated and harmonised across environmentally focused Helmholtz Centers.

As part of the HMC Project HARMONise, we aim to develop sustainable solutions and digital cultures to enable high-quality, standards-compliant curation and management of marine biomolecular metadata at AWI and GEOMAR, to better embed biomolecular science into broader digital ecosystems and research domains. Our approach builds on a relational database that aligns metadata with community standards such as the Minimum Information about any (x) sequence (MIxS) supported by the International Nucleotide Sequence Database Collaboration (INSDC), and with associated ontology content (e.g. The Environment Ontology - ENVO).

At the same time, we ensure the harmonization of metadata with existing Helmholtz repositories (e.g. PANGAEA). A web-portal for metadata upload and harvest will enable sustainable data stewardship and support researchers in delivering high-quality metadata to national and global repositories, and improve accessibility of the metadata.

Metadata subsets will be harvested by the Marine Data Portal ( https://marine-data.de ), increasing findability across research domains and promoting reuse of biomolecular research data. Alignment of the recorded metadata with community standards and relevant data exchange formats will support Helmholtz and global interoperability .

Project Cohort 2021

View details

Karlsruhe Institute of Technology (KIT)

External website View details

View details

Nikolay Garabedian

In summary, MetaCook creates a framework for the preparation of (1) FAIR Vocabularies, and (2) FAIR Ontologies. The VocPopuli software engages users in a Git-based collaborative composition of controlled vocabularies which are then converted via semi-supervised machine learning into ontologies with the help of OntoFAIRCook.

The importance of metadata when sharing FAIR data cannot be overstated. The MetaCook project was initially envisioned as a cookbook-like set of instructions for developing high-quality vocabularies and subsequently metadata. However, we decided to turn the cookbook into an interactive software called VocPopuli, which includes the collaborative development of controlled vocabularies. In this way, instead of reading lists of instructions, users from any background and level of experience can easily navigate VocPopuli and receive interactive guidance. Using VocPopuli ensures that the developed vocabularies will be FAIR themselves.

VocPopuli offers the capability to immediately apply the vocabulary to datasets stored in electronic lab notebooks. As an example, an integration with Kadi4Mat is already established, and we are currently implementing the same for Herbie.

Functionally, VocPopuli has a few main features:

GitLab login is used to associate user contributions with term versions and vocabularies
Each term’s provenance can be tracked both via VocPopuli’s user interface and GitLab’s commit history and branches
Every term contains: label, synonyms, translations, expected data type, broader terms, related internal/external terms, up/downvotes, collaborative discussion board, visual history graph
Each vocabulary contains: metadata about its contents, a hierarchical structure of terms, a set of allowed named relationships, a git repository
In the backend VocPopuli’s data is stored in a graph database
SKOS and PROV (in progress) are optional exports
Any resource can be digitalized: Lab Procedure; Lab Specimen, ELN, ELN Export, Data Analysis

FAIR vocabularies hold enough information which enables their transformation into ontologies. We verify this with a prototype of a second piece of software called OntoFAIRCook. At the time of writing, OntoFAIRCook is available as a command line tool, and is being transformed into a web interface which is easy to use.

Project Cohort 2021

View details

Karlsruhe Institute of Technology (KIT)

External website View details

View details

Matthias Schneider

This project will develop enhanced standards for storage efficient decomposed arrays and tools for an automated generation of standardised Lagrange trajectory data files thus enabling an optimised and efficient synergetic merging of large remote sensing data sets.

This data project contains data and software with regard to Metamorphoses , a joint project in the framework of Helmholtz Metadata Collaboration (HMC) with contributions from KIT-IMK and FZJ-IEK7.

Currently the amount and diversity of high-quality satellite-based atmospheric observations is quickly increasing, and their synergetic use offers unprecedented knowledge gaining opportunities. However, for this kind of interoperability and reusability of the remote sensing data, the storage intensive averaging kernels and errorcovariances are needed for each individual observation.

This project will develop enhanced standards for storage efficient decomposed arrays thus enabling the advanced reuse of very large remote sensing data sets. The synergetic data merging will be further supported by Lagrange trajectory metadata. For this purpose, the project will develop tools for an automated generation of standardised trajectory data files.

A case study will demonstrate the impact on science of the multi-sensor atmospheric observational data generated with the support of Lagrange trajectory calculations. The project will actively support data merging activities in other research fields.

Project Cohort 2021

View details

Helmholtz-Zentrum Berlin (HZB)

External website View details

View details

Klaus Kiefer

The Sample Environment Communication Protocol (SECoP) provides a generalized way for controlling measurement equipment – with a special focus on sample environment (SE) equipment. In addition, SECoP holds the possibility to transport SE metadata in a well-defined way.

Within the project SECoP@HMC, we are developing and implementing metadata standards for typical SE equipment at large scale facilities (photons, neutrons, high magnetic fields). A second focus is the mapping of the SECoP metadata standards to a unified SE vocabulary for a standardized metadata storage. Thus, a complete standardized system for controlling SE equipment and collecting and saving SE metadata will be available and usable in the experimental control systems (ECS) of the participating facilities. This approach can be applied to other research areas as well.

The project SECoP@HMC is organised in four work packages

Standards for Sample Environment metadata in SECoP (WP1)
Standards for storage of Sample Environment metadata (WP2)
Implementation into experimental control systems (WP3)
Outreach, Dissemination & Training (WP4)

The objectives of WP1 and WP2 are to standardize the provision and storage of metadata for SE equipment for FAIR compatible reuse and interoperability of the data. WP3 establishes SECoP as a common standard for SE communication at the involved centers by integrating the protocol into experiment control systems, easing the integration of new and user-built SE equipment into experiments providing sufficient SE metadata. In WP4 reach out to the metadata and experimental controls community, e.g. organising workshops and presenting SECoP@HMC at conferences (see e.g. figure 1).

Some background on SECoP:

SECoP is developed in cooperation with the International Society for Sample Environment (ISSE) as an international standard for the communication between SE equipment and ECS. It is intended to ease the integration of sample environment equipment supplied by external research groups and by industrial manufacturers.

SECoP is designed to be
simple
inclusive
self-explaining
providing metadata

Inclusive means, that different facilities can use this protocol and don't have to change their work flow, e.g. rewrite drivers completely or organize and handle hardware in a specific way to fulfil SECoP requirements. Simple means it should be easy to integrate and to use – even for the non-expert programmer. Self-explaining means that SECoP provides a complete human- and machine-readable description of the whole experimental equipment, including how to control it and what the equipment is representing. In respect to metadata, SECoP greatly facilitates and structures the provision of metadata which is associated with SE equipment.

Several implementations of SECoP are developed and support the design of SECoP-compatible sample environment control software. The complete specifications of SECoP are available on GitHub.

Project Cohort 2021

Project Cohort 2020

Project Cohort 2021

Project Cohort 2022

Project Cohort 2023

Projects

Cookies disclaimer