Single-cell genomics has had a transformative impact on basic biology and biomedical research (Regev et al., 2017). What is missing to enable robust solutions in clinical trials, health research and translation is to comprehensively capture all metadata associated with individual cells (Puntambekar et al., 2021). Metadata in this context is highly multi-layered and complex, tightly intervening technical and biological (sample-level) metadata. Addressing these requirements will require new standards and technical solutions to document, annotate and query properties across scales, from cellular identity, state and behavior under disease perturbation as well as technical covariates such as sample location and sequencing depth to tissue state and patient information, including clinical covariates, other genomics and imaging modalities as well as disease progression.
CellTrack builds on the track record of the Stegle and Theis labs who have pioneered computational methods to analyze single-cell data in biomedical settings and have contributed to major international consortia such as the Human Cell Atlas (HCA). We will also leverage existing research and infrastructures established at HMGU/DKFZ, which allow for managing, processing and sharing genomic data. The activities in this project will directly feed into highly visible national and international infrastructure activities, most notably the German Human Genome-Phenome Archive – a national genomics platform funded by the NFDI, and scVerse – a community platform to derive core infrastructure and interoperable software for key analytics tasks in single-cell genomics.
While single cell genomics is quickly approaching common biological and biomedical use, the field still lacks consistent integration with data management beyond count matrices and more so metadata management – this is arguably due to different scale (cell vs patient) and scope (research vs clinics). To address these issues, we propose to (1) build a metadata schema, (2) implementation as well as (3) use cases for robustly tracking, storing and managing metadata in single cell genomics.
The overall goal of CellTrack is to provide a consistent encoding of genomic metadata, thereby reducing many of the common errors related to identifier mapping.
Florian Heyl, “Metadata schema for the HCA|organoid data portal”; https://portal.hca-organoid.eu/.
Isaac Virshup, “Access to bioconductor’s EnsDB web resources for Python”; https://github.com/scverse/genomic-features.
Florian Heyl, “heylf/scmulti: Single-cell multiome quality control workflow”; https://github.com/heylf/scmulti.