Projects

Displaying 20 results.

Helmholtz-Zentrum Potsdam, Deutsches GeoForschungsZentrum GFZ (GFZ)
ALAMEDA proposes a standards-based metadata management system that prioritizes the end user perspective. This will allow access, search and compare meta-information across databases with an automated metadata enrichment utilizing available DataHub tools, implemented exemplary for soil moisture and stable isotope geochemistry.

Modern Earth sciences produce a continuous increasing amount of data. These data consist of the measurements/observations and descriptive information (metadata) and include semantic classifications (semantics). Depending on the geoscientific parameter, metadata are stored in a variety of different databases, standards and semantics, which is obstructive for interoperability in terms of limited data access and exchange, searchability and comparability. Examples of common data types with very different structure and metadata needs are maps, geochemical data derived from field samples, or time series data measured with a sensor at a point, such as precipitation or soil moisture.

So far, there is a large gap between the capabilities of databases to capture metadata and their practical use. ALAMEDA is designed as modular structured metadata management platform for curation, compilation, administration, visualization, storage and sharing of meta information of lab-, field- and modelling datasets. As a pilot application for stable isotope and soil moisture data ALAMEDA will enable to search, access and compare meta information across organization-, system- and domain boundaries.

ALAMEDA covers 5 major categories: observation & measurements, sample & data history, sensor & devices, methods & processing, environmental characteristics (spatio & temporal). These categories are hierarchically structured, interlinkable and filled with specific metadata attributes (e.g. name, data, location, methods for sample preparation, measuring and data processing, etc.). For the pilot, all meta information will be provided by existing and wellestablished data management tools (e.g. mDIS, Medusa, etc.).

In ALAMEDA, all information is brought together and will be available via web interfaces. Furthermore, the project focuses on features such as metadata curation with intuitive graphical user interfaces, the adoption of well-established standards, the use of domain-controlled vocabularies and the provision of interfaces for a standards-based dissemination of aggregated information. Finally, ALAMEDA should be integrated into the DataHub (Hub-Terra).

Deutsches Krebsforschungszentrum (DKFZ)
CellTrack aims to revolutionize single-cell genomics by developing robust metadata standards for capturing complex multi-layered data associated with individual cells. Coordinated by Oliver Stegle and Fabian Theis, this project leverages existing research and infrastructures to create a comprehensive toolchain for managing, manipulating, and visualizing metadata, thus enabling more effective clinical trials and health research.

Single-cell genomics has had a transformative impact on basic biology and biomedical research (Regev et al., 2017). What is missing to enable robust solutions in clinical trials, health research and translation is to comprehensively capture all metadata associated with individual cells (Puntambekar et al., 2021). Metadata in this context is highly multi-layered and complex, tightly intervening technical and biological (sample-level) metadata. Addressing these requirements will require new standards and technical solutions to document, annotate and query properties across scales, from cellular identity, state and behavior under disease perturbation as well as technical covariates such as sample location and sequencing depth to tissue state and patient information, including clinical covariates, other genomics and imaging modalities as well as disease progression.

CellTrack builds on the track record of the Stegle and Theis labs who have pioneered computational methods to analyze single-cell data in biomedical settings and have contributed to major international consortia such as the Human Cell Atlas (HCA). We will also leverage existing research and infrastructures established at HMGU/DKFZ, which allow for managing, processing and sharing genomic data. The activities in this project will directly feed into highly visible national and international infrastructure activities, most notably the German Human Genome-Phenome Archive – a national genomics platform funded by the NFDI, and scVerse – a community platform to derive core infrastructure and interoperable software for key analytics tasks in single-cell genomics.

While single cell genomics is quickly approaching common biological and biomedical use, the field still lacks consistent integration with data management beyond count matrices and more so metadata management – this is arguably due to different scale (cell vs patient) and scope (research vs clinics). To address these issues, we propose to (1) build a metadata schema, (2) implementation as well as (3) use cases for robustly tracking, storing and managing metadata in single cell genomics.

The overall goal of CellTrack is to provide a consistent encoding of genomic metadata, thereby reducing many of the common errors related to identifier mapping.

Karlsruhe Institute of Technology (KIT)
In this project, a method for metadata and data transfer between the open-source ELNs Chemotion and Herbie will be developed. The project will aim for a generalization of the specific process via the description of a general guideline and the implementation of available standards. The concept will be demonstrated for the use case of polymer membrane research.

Metadata can be recorded, stored, and published very efficiently by the use of electronic lab notebooks (ELNs) being a key prerequisite for a comprehensive documentation of research processes. However, the interdisciplinarity of modern research groups creates the necessity to use different ELNs or to use data interoperably in different ELNs. Despite manifold ELNs on the market, an interface between ELNs remains yet not achieved due to missing (metadata) standards but also missing collaboration efforts.

In this project, an interface for metadata transfer between the open-source ELNs Chemotion (development lead at KIT) and Herbie (development at Hereon) will be developed. The implemented methods will aim for a generalization of the process via a guideline and will include available standards. The development will be demonstrated for the use case of polymer membrane research. This tool will improve the interoperability and reusability of metadata and thus support all ELN users relying on Chemotion and Herbie by enriching their datasets with data from the complementary ELN. The project will aim for a generalization of the specific process via the description of a general guideline and the implementation of available standards.

The developments will generate a direct benefit for the growing community of scientists using the two selected ELNs as the ELN extensions and its metadata schemas will be adapted interactively with the scientific community.

Deutsches Zentrum für Luft- und Raumfahrt (DLR)
The aim of this project is to develop interoperable metadata recommendations in the form of FAIR digital objects (FDOs) for 5D (i.e. x, y, z, time, spatial reference) imagery of Earth and other planet(s). The main expected benefit would be to achieve more effectiveness and efficiency in managing, publishing and interpreting imaging data for both domains either in exploring the ocean floor or planetary surfaces and in between.

Imaging the environment is an essential and crucial component in spatial science. This concerns nearly everything between the exploration of the ocean floor and investigating planetary surfaces. In and between both domains, this is applied at various scales – from microscopy through ambient imaging to remote sensing – and provides rich information for science. Due to recent the increasing number data acquisition technologies, advances in imaging capabilities, and number of platforms that provide imagery and related research data, data volume in nature science, and thus also for ocean and planetary research, is further increasing at an exponential rate. Although many datasets have already been collected and analyzed, the systematic, comparable, and transferable description of research data through metadata is still a big challenge in and for both fields. However, these descriptive elements are crucial, to enable efficient (re)use of valuable research data, prepare the scientific domains e.g. for data analytical tasks such as machine learning, big data analytics, but also to improve interdisciplinary science by other research groups not involved directly with the data collection.

In order to achieve more effectiveness and efficiency in managing, interpreting, reusing and publishing imaging data, we here present a project to develop interoperable metadata recommendations in the form of FAIR digital objects (FDOs) for 5D (i.e. x, y, z, time, spatial reference) imagery of Earth and other planet(s). An FDO is a human and machine-readable file format for an entire image set, although it does not contain the actual image data, only references to it through persistent identifiers (FAIR marine images). In addition to these core metadata, further descriptive elements are required to describe and quantify the semantic content of imaging research data. Such semantic components are similarly domain-specific but again synergies are expected between Earth and planetary research.

Alfred-Wegener-Institut - Helmholtz-Zentrum für Polar- und Meeresforschung (AWI)
This collaborative project will develop sustainable solutions and digital cultures to enable high-quality, standards-compliant curation and management of marine biomolecular metadata to better embed biomolecular science in broader digital ecosystems and research domains.

Project Partners: AWI, GEOMAR

Biomolecular data, e.g. DNA and RNA sequences, provides insights into the structure and functioning of marine communities in space and time. The associated metadata has great internal diversity and complexity, and to date biomolecular (meta)data management is not well integrated and harmonised across environmentally focused Helmholtz Centers.

As part of the HMC Project HARMONise, we aim to develop sustainable solutions and digital cultures to enable high-quality, standards-compliant curation and management of marine biomolecular metadata at AWI and GEOMAR, to better embed biomolecular science into broader digital ecosystems and research domains. Our approach builds on a relational database that aligns metadata with community standards such as the Minimum Information about any (x) sequence (MIxS) supported by the International Nucleotide Sequence Database Collaboration (INSDC), and with associated ontology content (e.g. The Environment Ontology - ENVO).

At the same time, we ensure the harmonization of metadata with existing Helmholtz repositories (e.g. PANGAEA). A web-portal for metadata upload and harvest will enable sustainable data stewardship and support researchers in delivering high-quality metadata to national and global repositories, and improve accessibility of the metadata.

Metadata subsets will be harvested by the Marine Data Portal ( https://marine-data.de ), increasing findability across research domains and promoting reuse of biomolecular research data. Alignment of the recorded metadata with community standards and relevant data exchange formats will support Helmholtz and global interoperability .

Helmholtz-Zentrum Dresden Rossendorf (HZDR)
HELPMI will develop a metadata standard for experimental data of the global laser-plasma community. To date, this community is widely using openPMD, an open meta-standard, established for the domain of simulations.

At most laser-plasma research laboratories, type and format of experimental data and metadata are heterogeneous and complex. The data originate from several distinct sources, occur at various levels and at different times during an experimental campaign. Some metadata such as project names and proposal IDs appear usually quite early during an experimental project, while exact time stamps of experimental data or acquisition parameters of diagnostics are generated during the runtime. Similarly, data can originate as early from target pre-characterization, during the experiment’s setup phase and diagnostic calibration runs or ultimately from actual laser shots. Furthermore, the configuration and status of deployed diagnostics and therefore the experimental arrangement are often subject to changes during a campaign, also with very short notice and sometimes without previous planning but as a consequence of experimental results. Beyond, there is a strong need for better data integration and enrichment in the field of high-intensity laser-plasma physics in international context. This has become clear during several online events, e.g. the LPA Online Workshop on Machine Learning and Control Systems , the Laserlab-Europe – ELI – CASUS workshop or NFDI events . Setting out from this status quo and given their leading expertise in laser-driven experiments, HZDR, HI Jena and GSI will develop a metadata standard for the high-intensity laser-plasma community, with an initial emphasis on ion facilities during this project.

Proposed Work

  • Glossary and Ontology: A significant effort is required to conceive a sensible, widely applicable dictionary of concepts of data and metadata associated with the field. A reasonably close exchange to the worldwide community is planned via project observers and dedicated workshops. lead: HI Jena

  • Technical preparation: Prepare the openPMD standard and its API for custom hierarchies and datasets in general and demonstrate interoperability between NeXus and openPMD in particular. openPMD is a meta-standard originally developed as a data format for a high-performance simulation code and is recently being adopted for other simulation codes, enabling interoperability and easing e.g. analysis efforts due to existing software concomitant to openPMD. There is a strong interest to increase and facilitate the exchange between simulations and experiments within the laser-plasma community. NeXus on the other hand is a metadata-standard for experiments in the Photon and Neutron science community. We plan to overcome the present boundaries of the two standards. lead: HZDR

  • Application of the new metadata standard to concrete cases at participating centers for fast examination.

    GSI: There is no widely-used metadata format yet. Extend the PHELIX database (PSDB) towards the new standard; generation of data and metadata.

    HI Jena: Conduct a pilot beamtime and generate experimental data with metadata.

    HZDR: Apply the new standard to research data at HZDR and generate a data object on RODARE for demonstration of FAIR access.

Forschungszentrum Jülich (FZJ)
M³eta aims to establish an extensible and sustainable metadata scheme for momentum microscopy, which will be stored together with the measured data voxels. This will be the basis for a standardized work-flow that interprets the stored metadata and to reconstruct views of the multidimensional electronic structure of a material.






Photoelectron emission spectroscopy (PES) has matured into a versatile tool for characterizing the electronic properties of novel quantum materials. While historically PES was used for accessing the density of states of materials in one-dimensional energy scans, nowadays data sets provide detailed views of band dispersions and topologies, being indispensable for all fields of modern materials based sciences. The latest innovation in this field – photoelectron momentum microscopy (MM) – applies the principles of high-resolution imaging to record tomographic sections of the electronic structure in a high-dimensional parameter space. Despite the rapid world-wide adoption of MM as an universal materials characterization tool, currently no universal scheme exists to describe the highly divers set of parameters – for example, but not limited to, the momentum vector, energy, electron spin, light polarization states – that describe a MM experiment. Implementing findable, accessible, interoperable and reusable (FAIR) principles in momentum microscopy mandates new metadata schemes that describe the abundance of experimental parameters and thus link measured data voxels to the electronic properties of a material. The aim of M³eta is to establish such extensible and sustainable metadata scheme for momentum microscopy that will be stored in a structured file together with the measured data voxels. This will be the basis for an automated and interactive tool-chain that interprets the stored metadata and uses this information to reconstruct views of the multi-dimensional electronic structure of a material.

Deutsches Zentrum für Luft- und Raumfahrt (DLR)
Manufacturing of composite parts involves multiple process steps, which each involve large datasets generated from processing machines or surrounding sensors. With help of the recently developed data management system shepard, the project MEMAS aims at storing and connecting these manufacturing data in a persistent way.

Manufacturing of composite parts involves multiple process steps, from the production of semi-finished materials to their processing and assembly. At each level of production, large datasets can be produced to trace back each state of the composite material and relate it to the final quality of the manufactured structure. With help of the recently developed data management system shepard, the project MEMAS aims at storing and connecting these manufacturing data in a persistent way. It focuses particularly on the standardization, collection and annotation of meta-data and on their automatic transfer in a simulation environment to estimate the actual structural performances. The consideration of potential defects resulting from the manufacturing techniques or induced by the surrounding environment will allow for the improvement of finite-element methods and of their accuracy. Furthermore, the developed tools will support the manufacturing field and highlight the consequences of manufacturing parameters on the structural behaviour, enabling adjustments of the process parameters after each produced part. Finally, the persistent and structured storage of research data and their metadata in form of FAIR Digital Objects or with help of DataCrates will support long-term data analysis and the further comprehension of manufacturing techniques.

To this goal, software solutions will be developed for the two exemplary manufacturing processes tape laying and additive manufacturing and combined in a general toolchain. The potential of the developed methodology will be tested by performing mechanical tests on representative parts and by comparing the results with the numerically predicted behaviour. Overall acquired experimental, manufacturing and simulation data, meta-data formats and scientific results will be transferred via open-source solutions like the Zenodo platform in the HMC community.

Helmholtz Centre for Environmental Research (UFZ)
Metadata frameworks for facilitating interoperability in systems toxicology and pharmacology

In toxicology and pharmacology data from chemistry, biology, informatics, and human or ecosystem health science merge and toxicological metadata need to become interoperable and compliant with existing ontology-based data infrastructures of these fields.

A team from three Helmholtz programs ( Earth and Environment , Information , and Health ) will review existing metadata standards and ontologies across fields and generate an integrative, suitable ontology for the annotation of toxicological/pharmacological data and workflows from the experimental design to the data deposit in repositories.

We will establish a metadata framework for the FAIR description of exposure and experimental settings interlinked with chemical IDs and data processing workflows using ‘omics data, which will be implemented into the community-based “Galaxy” project. This will enable interoperability between disciplines to address the grand challenges of chemical pollution and human and ecosystem health.

Karlsruhe Institute of Technology (KIT)
In summary, MetaCook creates a framework for the preparation of (1) FAIR Vocabularies, and (2) FAIR Ontologies. The VocPopuli software engages users in a Git-based collaborative composition of controlled vocabularies which are then converted via semi-supervised machine learning into ontologies with the help of OntoFAIRCook.

The importance of metadata when sharing FAIR data cannot be overstated. The MetaCook project was initially envisioned as a cookbook-like set of instructions for developing high-quality vocabularies and subsequently metadata. However, we decided to turn the cookbook into an interactive software called VocPopuli, which includes the collaborative development of controlled vocabularies. In this way, instead of reading lists of instructions, users from any background and level of experience can easily navigate VocPopuli and receive interactive guidance. Using VocPopuli ensures that the developed vocabularies will be FAIR themselves.

VocPopuli offers the capability to immediately apply the vocabulary to datasets stored in electronic lab notebooks. As an example, an integration with Kadi4Mat is already established, and we are currently implementing the same for Herbie.

Functionally, VocPopuli has a few main features:

  • GitLab login is used to associate user contributions with term versions and vocabularies

  • Each term’s provenance can be tracked both via VocPopuli’s user interface and GitLab’s commit history and branches

  • Every term contains: label, synonyms, translations, expected data type, broader terms, related internal/external terms, up/downvotes, collaborative discussion board, visual history graph

  • Each vocabulary contains: metadata about its contents, a hierarchical structure of terms, a set of allowed named relationships, a git repository

  • In the backend VocPopuli’s data is stored in a graph database

  • SKOS and PROV (in progress) are optional exports

  • Any resource can be digitalized: Lab Procedure; Lab Specimen, ELN, ELN Export, Data Analysis

FAIR vocabularies hold enough information which enables their transformation into ontologies. We verify this with a prototype of a second piece of software called OntoFAIRCook. At the time of writing, OntoFAIRCook is available as a command line tool, and is being transformed into a web interface which is easy to use.

Karlsruhe Institute of Technology (KIT)
This project will develop enhanced standards for storage efficient decomposed arrays and tools for an automated generation of standardised Lagrange trajectory data files thus enabling an optimised and efficient synergetic merging of large remote sensing data sets.

This data project contains data and software with regard to Metamorphoses , a joint project in the framework of Helmholtz Metadata Collaboration (HMC) with contributions from KIT-IMK and FZJ-IEK7.

Currently the amount and diversity of high-quality satellite-based atmospheric observations is quickly increasing, and their synergetic use offers unprecedented knowledge gaining opportunities. However, for this kind of interoperability and reusability of the remote sensing data, the storage intensive averaging kernels and errorcovariances are needed for each individual observation.

This project will develop enhanced standards for storage efficient decomposed arrays thus enabling the advanced reuse of very large remote sensing data sets. The synergetic data merging will be further supported by Lagrange trajectory metadata. For this purpose, the project will develop tools for an automated generation of standardised trajectory data files.

A case study will demonstrate the impact on science of the multi-sensor atmospheric observational data generated with the support of Lagrange trajectory calculations. The project will actively support data merging activities in other research fields.

Alfred-Wegener-Institut - Helmholtz-Zentrum für Polar- und Meeresforschung (AWI)
Within MetaSeis, we will develop a unifying data infrastructure and prepare for future archival of 3D reflection seismic data and active OBS data from recent and future research cruises. We aim to adopt and extend existing standards and interoperable vocabularies including metadata quality and validation checks, and to establish a distributed infrastructure for data curation by harmonized data workflows with connections to international data repositories.

Reflection seismic data, 2D as well as 3D, and refraction data such as active OBS data are the paramount source of information for the deep subsurface structure as they provide by far the highest resolution of any comparable geophysical technique. To this date, they have been used for a large variety of academic and commercial purposes. For many decades, reflection and refraction seismic data were the largest data sets in earth sciences, which created significant storage and archival problems. This fact and the lack of metadata standards hampers all new scientific projects that would like to use present-day and legacy data. However, GEOMAR has already initiated the implementation of the FAIR standards concerning 2D seismic data within a NFDI4Earth Pilot “German Marine Seismic Data Access” running until February 2023 in cooperation with the University of Hamburg and the University of Bremen.

Within MetaSeis, we will develop a unifying data infrastructure and prepare for future archival of reflection 3D seismic data and active OBS data from recent and future research cruises. We aim to adopt and extend existing standards and interoperable vocabularies in the seismic metadata including metadata quality and validation checks. To ensure long-term archival according to the FAIR-principles, a workflow for the integration of future and legacy data sets will be established along with best practices developed within previous projects (Mehrtens and Springer, 2019).

With this initiative, HMC will serve Germany’s marine geophysics community as represented by AGMAR of the Fachkollegium Physik der Erde of DFG but also contributes to the efforts of NFDI4Earth/DAM/DataHUB and the involved Helmholtz centres to establish a distributed infrastructure for data curation by harmonized data workflows with connections to international data repositories such as MGDS (Marine Geoscience Data System), IEDA (Interdisciplinary Earth Data Alliance), SNAP (Seismic data Network Access Point) and SeaDataNet (Pan-European Infrastructure for Ocean and Marine Data Management). International cooperation will benefit from synergy with the industrial seismic standard Open Subsurface Data Universe (OSDU) to ensure future cooperation between industry and academic research.

Karlsruhe Institute of Technology (KIT)
Interoperability for Surface Data diving from Surface to Structure

Capitalizing on advancements in Large Language Models (LLMs), MetaSupra aspires to expedite the process of metadata enrichment in FAIR-compliant repositories.

Specifically, MetaSupra will enhance SupraBank, a platform that provides machinereadable physicochemical parameters and metadata of intermolecular interactions in solution. By utilizing LLMs, we aim to develop data crawlers capable of extracting context-specific information from chemical literature, simplifying data acquisition, and curating FAIR repositories. This crawler software will be made accessible, inviting potential adoption by other Helmholtz centers. In addition, MetaSupra will illustrate how repositories' utility for correlation studies, machine learning, and educational purposes can be substantially amplified through the integration of quantum-chemically computed molecular parameters, positioning it as a model for other chemical repositories and moving forward with its integration into IUPAC activitie

The MetaSurf project aims to apply FAIR principles to surface science data, enhancing its management and accessibility. By integrating advanced computational tools into Kadi4Mat, it seeks to automate processing and create a standardized public data repository. This initiative focuses on developing metadata-centric tools and workflows to support efficient data exchange and management.

The MetaSurf project is a comprehensive initiative aimed at transforming how data is managed, shared, and utilized in the field of surface science. It seeks to implement the FAIR (Findable, Accessible, Interoperable, and Reusable) principles across a broad spectrum of experimental and simulation data. The project's central objectives include:

  1. Extension of Existing Infrastructure: Enhancing the Kadi4Mat platform by integrating advanced simulation and modeling workflows, GitLab, and JupyterLab. This extension aims to facilitate automated processing steps and streamline the data management process.

  2. Development of a Public Data Repository: Establishing a centralized repository for surface science data, accessible to the global research community. This repository will serve as a hub for data exchange, fostering collaboration and accelerating scientific discovery.

  3. Metadata-Driven Approach: Emphasizing the use of metadata, electronic lab notebooks, and data repositories to promote reproducibility and transparency in research. By developing tools, workflows, and templates that leverage metadata, the project intends to enable a more structured approach to data management, ensuring that data from diverse sources can be easily integrated and analyzed.

  4. Community Engagement and Standardization: Working closely with the surface science community to develop standards for data exchange and processing. The project aims to cultivate a culture of data sharing and collaboration, encouraging researchers to adopt these standards in their work.

  5. Innovation in Data Processing: Introducing new processing tools and techniques designed to handle the complexities of surface science data. These innovations will address the specific needs of the community, such as data visualization, analysis, and interpretation, enhancing the overall quality and impact of research in this field.


By achieving these goals, the MetaSurf project aspires to create a more cohesive, efficient, and innovative research environment in surface science, where data can be easily accessed, shared, and leveraged to drive new discoveries and advancements.

Helmholtz-Zentrum Potsdam, Deutsches GeoForschungsZentrum GFZ (GFZ)
The project aims to develop meta data standards for interoperability of spaceborn data from different satellites and communities.

Currently data collection is mostly done by the instrument teams on various ESA, NASA, JAXA and other agency’s missions. Different data products often even from the same satellite mission use different formats and rarely use the standard practices accepted for metadata in more coordinated communities such as atmospheric and oceanic sciences. Moreover, data versions and attributes, instrument PID’s and workflows are not properly recorded which makes a reproduction of the results practically impossible. As a consequence of this lack of standardization both in data access and format, the accessibility and reusability of data provided by satellite missions with budgets of up to several hundred million Euros is substantially limited. As an example, NASA’s flagship Van Allen Probes mission included a number of instruments and each of the instrument teams utilized different

metadata standards as well as different data formats. Reconstruction of historical behavior of the radiation belts is even more complicated as most of the historical data are written in binary codes, sometimes with little documentation.

Similarly, the quantification of precipitating fluxes needed as input for atmospheric models from radiation measurements is often difficult, as relevant properties for the estimation of precipitations’ quantities are either not provided or difficult to obtain. The situation is somewhat similar for ionospheric observational data that are growing exponentially. Numerous ionospheric measurements provided by the GNSS satellites, science missions such as COSMIC I, COSMIC II and now commercial fleets such as Spire provide a vast amount of measurements that are described in various different metadata formats.

Initial efforts have been made to introduce standardization for radiation belt physics. The Committee on Space Research (COSPAR) Panel on Radiation Belt Environment Modeling (PRBEM) developed the “Standard file format guidelines for particle count rate” for data stored in cdf format. NASA’s Space Physics Data Facility (SPDF) makes use of these guidelines for several products but uses different formats for different communities of data providers and stakeholders. The format contains attributes that can hold metadata describing data content, but does not hold information about workflows, nor does it make use of persistent identifiers. For ionospheric sciences, DLR Neustrelitz has pioneered introducing the formats for the ionospheric community during its involvement in CHAMP and GRACE satellite missions as an operator for ionospheric products generation and distribution. Later

DLR’s involvement in several national (SWACI, IMPC) and EU projects such as ESPAS and PITHIA-NRF led to the development of first preparatory standards for ionospheric products. The increasing use of data assimilation and machine learning requiring a vast amount of data from different sources makes this project most timely.

ORCID iD icon

The collection and usage of sensor data are crucial in science, enabling the evaluation of experiments and validation of numerical simulations. This includes sensor maintenance metadata, e.g. calibration parameters and maintenance time windows. Enriched sensor data allows scientists to assess data accuracy, reliability, and consistence through Quality Assurance and Quality Control (QA/QC) processes. Today, maintenance metadata is often collected but not readily accessible due to its lack of digitalization. Such audit logs are commonly stored in analogue notebooks, which poses challenges regarding accessibility, efficiency, and potential transcription errors.

In MOIN4Herbie (Maintenance Ontology and audit log INtegration for Herbie), we will address the obvious lack of digitized maintenance metadata in Helmholtz’s research areas Information and Earth and Environment.

To this end, MOIN4Herbie will extend the electronic lab notebook Herbie – developed at the Hereon Institute of Metallic Biomaterials - with ontology-based forms to deliver digital records of sensor maintenance metadata for two pilot cases: For both the redeployed Boknis Eck underwater observatory and the already implemented Tesperhude Research Platform we will establish a digital workflow from scratch.

This will lead to a unified and enhanced audit of sensor maintenance metadata, and thus more efficient data recording, the empowerment of technicians to collect important metadata for scientific purpose, and last but not least improvement and facilitation of the scientific evaluation and use of sensor data.

Deutsches Elektronen-Synchrotron (DESY)
The focus of PATOF is on making the data of the A4 experiment (and in future those of ALPS II, PRIMA, P2, and LUXE) publicly available and usable. In addition, a “cookbook” will be provided that captures the methodology for making individual experiment-specific metadata schemas FAIR, and a “FAIR Metadata Factory” will be described, i.e. a process to create a naturally evolved metadata schema.



The main goal of PATOF is to make the data of a number of participating experiments fully publicly available and FAIR as far as possible. The work is building on first experience gained at the Mainz A4 nucleon structure experiment (1999-2012).

Here, the analysis, reorganisation, preparation and subsequent publication of the A4 data and A4 analysis environment according to FAIR principles shall be achieved. The lessons learned at A4 are then to be applied to other experiments at DESY (ALPS II, LUXE) and Mainz (PRIMA, P2), collectively called APPLe experiments. In the process, a general and living cookbook – or at least a first collection of recipes – on how to create metadata for PUNCH experiments and how to make their data FAIR is also aimed for.

The cookbook will capture the methodology for making individual experiment-specific metadata schemas FAIR. Another output is the “FAIR metadata factory”, i.e. a process to create a naturally evolved metadata schema for different experiments by extending the DataCite schema without discarding the original metadata concepts.

Helmholtz-Zentrum Berlin (HZB)
The Sample Environment Communication Protocol (SECoP) provides a generalized way for controlling measurement equipment – with a special focus on sample environment (SE) equipment. In addition, SECoP holds the possibility to transport SE metadata in a well-defined way.

The Sample Environment Communication Protocol (SECoP) provides a generalized way for controlling measurement equipment – with a special focus on sample environment (SE) equipment. In addition, SECoP holds the possibility to transport SE metadata in a well-defined way.

Within the project SECoP@HMC, we are developing and implementing metadata standards for typical SE equipment at large scale facilities (photons, neutrons, high magnetic fields). A second focus is the mapping of the SECoP metadata standards to a unified SE vocabulary for a standardized metadata storage. Thus, a complete standardized system for controlling SE equipment and collecting and saving SE metadata will be available and usable in the experimental control systems (ECS) of the participating facilities. This approach can be applied to other research areas as well.

The project SECoP@HMC is organised in four work packages

  • Standards for Sample Environment metadata in SECoP (WP1)

  • Standards for storage of Sample Environment metadata (WP2)

  • Implementation into experimental control systems (WP3)

  • Outreach, Dissemination & Training (WP4)

The objectives of WP1 and WP2 are to standardize the provision and storage of metadata for SE equipment for FAIR compatible reuse and interoperability of the data. WP3 establishes SECoP as a common standard for SE communication at the involved centers by integrating the protocol into experiment control systems, easing the integration of new and user-built SE equipment into experiments providing sufficient SE metadata. In WP4 reach out to the metadata and experimental controls community, e.g. organising workshops and presenting SECoP@HMC at conferences (see e.g. figure 1).

Some background on SECoP:

SECoP is developed in cooperation with the International Society for Sample Environment (ISSE) as an international standard for the communication between SE equipment and ECS. It is intended to ease the integration of sample environment equipment supplied by external research groups and by industrial manufacturers.

  • SECoP is designed to be

  • simple

  • inclusive

  • self-explaining

  • providing metadata

Inclusive means, that different facilities can use this protocol and don't have to change their work flow, e.g. rewrite drivers completely or organize and handle hardware in a specific way to fulfil SECoP requirements. Simple means it should be easy to integrate and to use – even for the non-expert programmer. Self-explaining means that SECoP provides a complete human- and machine-readable description of the whole experimental equipment, including how to control it and what the equipment is representing. In respect to metadata, SECoP greatly facilitates and structures the provision of metadata which is associated with SE equipment.

Several implementations of SECoP are developed and support the design of SECoP-compatible sample environment control software. The complete specifications of SECoP are available on GitHub.

Helmholtz-Zentrum Dresden Rossendorf (HZDR)
A curation and reporting dashboard for compliant FAIR software publications

Research software should be published in repositories that assign persistent identifiers and make metadata accessible. Metadata must be correct and rich to support the FAIR4RS principles. Their curation safeguards quality and compliance with institutional software policies. Furthermore, software metadata can be enriched with usage and development metadata for evaluation and academic reporting. Metadata curation, publication approval and evaluation processes require human interaction and should be supported by graphical user interfaces.

We create "Software CaRD" (Software Curation and Reporting Dashboard), an open source application that presents software publication metadata for curation. Preprocessed metadata from automated pipelines are made accessible in a structured graphical view, with highlighted issues and conflicts. Software CaRD also assesses metadata for compliance with configurable policies, and lets users track and visualize relevant metadata for evaluation and reporting.

ORCID iD icon
Karlsruhe Institute of Technology (KIT)
Helmholtz-Zentrum Potsdam, Deutsches GeoForschungsZentrum GFZ (GFZ)

The HMC-funded STAMPLATE project aims to implement and establish the SensorThings API (STA) of the Open Geospatial Consortium (OGC) as a consistent, modern and lightweight data interface for time series data. Using representative use-cases from all seven research centers of the Helmholtz Research Field Earth & Environment, we ensure transferability and applicability of our solutions for a wide range of measurement systems. Our project is, hence, making a decisive contribution towards a digital ecosystem and an interlinked, consistent, and FAIR research data infrastructure tailored towards time-series data from environmental sciences.


Time-series data are crucial sources of reference information in all environmental sciences. Beyond research applications, the consistent and timely publication of such data is increasingly important for monitoring and issuing warnings, especially in times of growing frequencies of climatic extreme events. In this context, the seven Centers from the Helmholtz Research Field Earth and Environment (E&E) operate some of the largest environmental measurement-infrastructures worldwide. These infrastructures range from terrestrial observation systems in the TERENO observatories and ship-borne sensors to airborne and space-based systems, such as those integrated into the IAGOS infrastructures.

In order to streamline and standardize the usage of the huge amount of data from these infrastructures, the seven Centers have jointly initiated the STAMPLATE project. This initiantive aims to adopt the Open Geospatial Consortium (OGC) SensorThings API (STA) as a consistent and modern interface tailored for time-series data. We evaluate STA for representative use-cases from environmental sciences and enhance the core data model with additional crucial metadata such as data quality, data provenance and extended sensor metadata. We further integrate STA as central data interface into community-based tools for, e.g., data visualization, data access, QA/QC or the management of observation systems. By connecting the different STA endpoints of the participating research Centers, we establish an interlinked research data infrastructure (RDI) and a digital ecosystem around the OGC SensorThings API tailored towards environmental time-series data.

With our project, we further want to promote STA for similar applications and communities beyond our research field. Ultimately, our goal is to provide an important building block towards fostering a more open, FAIR (Findable, Accessible, Interoperable, and Reusable), and harmonized research data landscape in the field of environmental sciences.