Data structures
Data structures are the foundational building block for all scverse packages. Building upon common data structures ensures interoperability.Modality-specific extensions
In addition to these packages, we define standards on how to represent certain data types in these data structures. For now, such a specification is available for Adaptive Immune Receptor Repertoire (AIRR) data. Representations for other data types (e.g. scATAC-seq) will follow.
Packages maintained by core team
These packages are considered foundational in that many other packages build upon them. Joint maintenance by the core team guarantees long-term stability.Ecosystem packages maintained by scverse community
Many popular packages rely on scverse functionality. For instance, they take advantage of established data format standards such as AnnData and MuData, or are designed to be integrated into the workflow of analysis frameworks. Here, we list ecosystem packages following development best practices (continuous testing, documented, available through standard distribution tools).
This listing is a work in progress. See scverse/ecosystem-packages for inclusion criteria, and to submit more packages.
Package | Description |
---|---|
CellCharter | CellCharter is a framework to identify, characterize and compare spatial domains from spatial omics and multi-omics data. |
CellOracle | A computational tool that integrates single-cell transcriptome and epigenome profiles to infer gene regulatory networks (GRNs), critical regulators of cell identity. |
CellRank | CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules - kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these. |
Cell_BLAST | Cell BLAST is a cell querying tool for single-cell transcriptomics data. |
CellphoneDB | CellphoneDB is a publicly available repository of HUMAN curated receptors, ligands and their interactions paired with a tool to interrogate your own single-cell transcriptomics data (or even bulk transcriptomics data if your samples represent pure populations!). A distinctive feature of CellphoneDB is that the subunit architecture of either ligands and receptors is taken into account, representing heteromeric complexes accurately. This is crucial, as cell communication relies on multi-subunit protein complexes that go beyond the binary representation used in most databases and studies. CellphoneDB also incorporates biosynthetic pathways in which we use the last representative enzyme as a proxy of ligand abundance, by doing so, we include interactions involving non-peptidic molecules. CellphoneDB includes only manually curated and reviewed molecular interactions with evidenced role in cellular communication. |
Cirrocumulus | Cirrocumulus is an interactive visualization tool for large-scale single-cell genomics data. |
DoubletDetection | DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices. |
GPTBioInsightor | GPTBioInsightor is a tool designed for single-cell data analysis, particularly beneficial for newcomers to a biological field or those in interdisciplinary areas who may lack sufficient biological background knowledge. GPTBioInsightor utilizes the powerful capabilities of large language models to help people quickly gain knowledge and insight, enhancing their work efficiency. |
GRnnData | An onverload of anndata to more easily work with gene networks. Allows easy conversion between anndata and grnndata and provide loads of usefull utilities functions. |
Mowgli | Paired single-cell multi-omics data integration with Optimal Transport-flavored Nonnegative Matrix Factorization |
Multivelo | A mechanistic model of gene expression that extends the popular RNA velocity framework by incorporating epigenomic data. |
PILOT | PILOT is a Python library for Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport. |
PathML | An open-source toolkit for computational pathology and machine learning. |
PyDESeq2 | PyDESeq2 is a python package for bulk RNA-seq differential expression analysis. It is a re-implementation from scratch of the main features of the R package DESeq2 (Love et al. 2014). |
Rectangle | Rectangle is a python package for computational deconvolution. Rectangle presents a novel approach to second-generation deconvolution, characterized by hierarchical processing, an estimation of unknown cellular content and a significant reduction in data volume during signature matrix computation. |
SC2Spa | SC2Spa is a deep learning-based tool for predicting the spatial coordinates of single cells based on transcriptome. Two paired single cell and spatial transcriptomic datasets are required to run SC2Spa. SC2Spa is trained on a ST reference dataset to learn the relationship of gene expression and spatial coordinates. The trained fully-connected neural network can be used to predict the locations of a single cell with only the transcriptomic profile as input. The predicted locations of single cells can be further used to study the communication of the single cells. |
SCALEX | SCALEX is an integration and projection tool for atlas-level single-cell RNA-seq and ATAC-seq data. |
SnapATAC2 | SnapATAC2 is the successor of the SnapATAC R package, featuring:
|
anndata for R | A ‘reticulate’ wrapper for the Python package ‘anndata’. Provides a scalable way of keeping track of data and learned annotations. Used to read from and write to the h5ad file format. |
benGRN | Benchmarking tool for gene network inference from single cell RNAseq methods. It uses the grnndata/anndata modality and only contains biological ground truth networks |
bento-tools | A Python toolkit for subcellular analysis of spatial transcriptomics data |
biolord | biolord (biological representation disentanglement) is a deep generative framework for disentangling known and unknown attributes in single-cell data. |
cell2location | Cell2location is a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single-cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. |
cellxgene | CZ CELLxGENE Annotate (pronounced “cell-by-gene”) is an interactive data explorer for single-cell datasets, such as those coming from the Human Cell Atlas. |
dandelion | dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5’ data. It streamlines the pre-processing, leveraging some tools from immcantation suite, and integrates with scanpy/anndata for single-cell BCR/TCR analysis. It also includes a couple of functions for visualization. |
decoupler | Python package to infer biological activities from omics data using a collection of methods. |
dynamo-release | Inclusive model of expression dynamics with metabolic labeling based scRNA-seq / multiomics, vector field reconstruction, potential landscape mapping, differential geometry analyses, and most probably paths / in silico perturbation predictions. |
epiScanpy | EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data. |
fava | FAVA uses Variational Autoencoders to infer functional associations from large-scale scRNA-seq (and proteomics) data. |
flowsom | The complete FlowSOM package known from R, now available in Python! Analyze high-dimensional cytometry data using FlowSOM, a clustering and visualization algorithm based on a self-organizing map (SOM). FlowSOM is used to distinguish cell populations from cytometry data in an unsupervised way and can help to gain deeper insights in fields such as immunology and oncology. |
gssnng | Single-cell gene set scoring with nearest neighbor graph smoothed data. |
hotspot | Hotspot is a tool for identifying informative genes (and gene modules) in a single-cell dataset. |
infercnvpy | Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy. |
liana | Python package to infer cell-cell communication events from omics data using a collection of methods. |
maxspin | An information theoretic approach to detecting spatially varying genes |
moscot | moscot is a scalable toolbox for multiomics single-cell optimal transport applications. |
omicverse | OmicVerse is the fundamental package for multi omics included bulk and single cell analysis with Python. The original name of the omicverse was Pyomic, but we wanted to address a whole universe of transcriptomics, so we changed the name to OmicVerse, it aimed to solve all task in RNA-seq. |
Palantir | Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq. |
Panpipes | A pipeline for multiomic single-cell and spatial transcriptomic data analysis |
pegasus | Pegasus is a tool for analyzing transcriptomes of millions of single cells. |
pertpy | pertpy is a framework for the analysis of multi-condition omics data. |
pyLemur | Python implementation of the LEMUR algorithm for analyzing multi-condition single-cell RNA-seq data.. |
pySCENIC | pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data. |
pychromVAR | A python pacakge for chromVAR. |
pytximport | A Python port of the tximport R package for importing transcript-level quantification data from various RNA-seq quantification tools such as salmon and kallisto and summarizing it to the gene level. |
rapids-singlecell | A GPU acclerated python package for singlecell data analysis |
scDataLoader | A dataloader for large single cell databases like cellxgene. Does weighted random sampling, downloading and preprocessing. works with anndata, zarr, and h5ad files. |
scFates | A scalable python package for tree inference and advanced pseudotime analysis from scRNAseq data. |
scGen | scGen is a generative model to predict single-cell perturbation response across cell types, studies and species. |
scPRINT | A single cell foundation model for Gene network inference and more… |
scanpro | robust cell proportion analysis for single cell data |
schist | schist applies Stochastic Block Models (SBM) to the analysis of single cell data, in particular to identify cell populations |
scib | Evaluating single-cell data integration methods |
scTriangulate | Python package to mix-and-match conflicting clustering results in single cell analysis and generate reconciled clustering solutions |
scVelo | scVelo is a scalable toolkit for RNA velocity analysis in single cells, based on Bergen et al., Nature Biotech, 2020. |
scyan | Biology-driven deep generative model for cell-type annotation in cytometry. Scyan is an interpretable model that also corrects batch-effect and can be used for debarcoding or population discovery. |
sfaira | sfaira is a model and a data repository in a single python package. |
sift-sc | SiFT is a computational framework which aims to uncover the underlying structure by filtering out previously exposed biological signals. SiFT can be applied to a wide range of tasks, from (i) the removal of unwanted variation as a pre-processing step, through (ii) revealing hidden biological structure by utilizing prior knowledge with respect to existing signal, to (iii) uncovering trajectories of interest using reference data to remove unwanted variation. |
Sobolev Alignment | Sobolev alignment of deep probabilistic models for comparing single cell profiles from pre-clinical models and patients |
sopa | Technology-invariant pipeline for spatial-omics analysis that scales to millions of cells. It includes segmentation, annotation, spatial statistics, and efficient visualization. |
spatial-eggplant | Python package designed to transfer information from multiple spatial-transcriptomics data sets to a single reference representing a Common Coordinate Framework (CCF). |
Symphonypy | Symphonypy is a pure Python port of Symphony label transfer algorithm for reference-based cell type annotation. |
tangram | Spatial alignment and gene expression mapping of single cell transcriptomic data. |
vitessce | Vitessce consists of reusable interactive views including a scatterplot, spatial+imaging plot, genome browser tracks, statistical plots, and control views, built on web technologies such as WebGL. |