Packages

Data structures

Data structures are the foundational building block for all scverse packages. Building upon common data structures ensures interoperability.

anndata AnnData is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.

GitHub Documentation PyPI Conda

mudata MuData is a format for annotated multimodal datasets where each modality is represented by an AnnData object. MuData’s reference implementation is in Python, and the cross-language functionality is achieved via HDF5-based .h5mu files with libraries in R and Julia.

GitHub Documentation PyPI Conda Muon.jl

spatialdata SpatialData is a data framework that comprises a FAIR storage format and a collection of python libraries for performant access, alignment, and processing of uni- and multi-modal spatial omics datasets. This repository contains the core spatialdata library. See the links below to learn more about other packages in the SpatialData ecosystem.

GitHub Documentation PyPI spatialdata-io

Modality-specific extensions

In addition to these packages, we define standards on how to represent certain data types in these data structures. For now, such a specification is available for Adaptive Immune Receptor Repertoire (AIRR) data. Representations for other data types (e.g. scATAC-seq) will follow.

Packages maintained by core team

These packages are considered foundational in that many other packages build upon them. Joint maintenance by the core team guarantees long-term stability.

scanpy Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

GitHub Documentation and tutorials PyPI Conda

muon muon is a Python framework for multimodal omics analysis. While there are many features that muon brings to the table, there are three key areas that its functionality is focused on.

GitHub Documentation Tutorials PyPI Website

scvi-tools scvi-tools is a library for developing and deploying machine learning models based on PyTorch and AnnData. With an emphasis on probablistic models, scvi-tools steamlines the development process via training, data management, and user interface abstractions. scvi-tools also contains easy-to-use implementations of more than 14 state-of-the-art probabilistic models in the field.

GitHub Documentation and tutorials PyPI Website

scirpy Scirpy is a scalable toolkit to analyse T-cell receptor or B-cell receptor repertoires from single-cell RNA sequencing data. It seamlessly integrates with scanpy and provides various modules for data import, analysis and visualization.

GitHub Documentation and tutorials PyPI Conda

squidpy Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

GitHub Documentation and tutorials PyPI

Ecosystem packages maintained by scverse community

Many popular packages rely on scverse functionality. For instance, they take advantage of established data format standards such as AnnData and MuData, or are designed to be integrated into the workflow of analysis frameworks. Here, we list ecosystem packages following development best practices (continuous testing, documented, available through standard distribution tools).

This listing is a work in progress. See scverse/ecosystem-packages for inclusion criteria, and to submit more packages.

Package	Description
CellCharter	CellCharter is a framework to identify, characterize and compare spatial domains from spatial omics and multi-omics data.
CellOracle	A computational tool that integrates single-cell transcriptome and epigenome profiles to infer gene regulatory networks (GRNs), critical regulators of cell identity.
CellRank	CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules - kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these.
Cell_BLAST	Cell BLAST is a cell querying tool for single-cell transcriptomics data.
CellphoneDB	CellphoneDB is a publicly available repository of HUMAN curated receptors, ligands and their interactions paired with a tool to interrogate your own single-cell transcriptomics data (or even bulk transcriptomics data if your samples represent pure populations!). A distinctive feature of CellphoneDB is that the subunit architecture of either ligands and receptors is taken into account, representing heteromeric complexes accurately. This is crucial, as cell communication relies on multi-subunit protein complexes that go beyond the binary representation used in most databases and studies. CellphoneDB also incorporates biosynthetic pathways in which we use the last representative enzyme as a proxy of ligand abundance, by doing so, we include interactions involving non-peptidic molecules. CellphoneDB includes only manually curated and reviewed molecular interactions with evidenced role in cellular communication.
Cirrocumulus	Cirrocumulus is an interactive visualization tool for large-scale single-cell genomics data.
DoubletDetection	DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.
GPTBioInsightor	GPTBioInsightor is a tool designed for single-cell data analysis, particularly beneficial for newcomers to a biological field or those in interdisciplinary areas who may lack sufficient biological background knowledge. GPTBioInsightor utilizes the powerful capabilities of large language models to help people quickly gain knowledge and insight, enhancing their work efficiency.
Mowgli	Paired single-cell multi-omics data integration with Optimal Transport-flavored Nonnegative Matrix Factorization
Multivelo	A mechanistic model of gene expression that extends the popular RNA velocity framework by incorporating epigenomic data.
PILOT	PILOT is a Python library for Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport.
PathML	An open-source toolkit for computational pathology and machine learning.
PyDESeq2	PyDESeq2 is a python package for bulk RNA-seq differential expression analysis. It is a re-implementation from scratch of the main features of the R package DESeq2 (Love et al. 2014).
Rectangle	Rectangle is a python package for computational deconvolution. Rectangle presents a novel approach to second-generation deconvolution, characterized by hierarchical processing, an estimation of unknown cellular content and a significant reduction in data volume during signature matrix computation.
SC2Spa	SC2Spa is a deep learning-based tool for predicting the spatial coordinates of single cells based on transcriptome. Two paired single cell and spatial transcriptomic datasets are required to run SC2Spa. SC2Spa is trained on a ST reference dataset to learn the relationship of gene expression and spatial coordinates. The trained fully-connected neural network can be used to predict the locations of a single cell with only the transcriptomic profile as input. The predicted locations of single cells can be further used to study the communication of the single cells.
SCALEX	SCALEX is an integration and projection tool for atlas-level single-cell RNA-seq and ATAC-seq data.
SnapATAC2	SnapATAC2 is the successor of the SnapATAC R package, featuring: Faster and less memory usage, scale to >1M cells. Improved dimension reduction and sampling algorithm.
anndata for R	A ‘reticulate’ wrapper for the Python package ‘anndata’. Provides a scalable way of keeping track of data and learned annotations. Used to read from and write to the h5ad file format.
bento-tools	A Python toolkit for subcellular analysis of spatial transcriptomics data
biolord	biolord (biological representation disentanglement) is a deep generative framework for disentangling known and unknown attributes in single-cell data.
cell2location	Cell2location is a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single-cell and spatial transcriptomics with higher sensitivity and resolution than existing tools.
cellxgene	CZ CELLxGENE Annotate (pronounced “cell-by-gene”) is an interactive data explorer for single-cell datasets, such as those coming from the Human Cell Atlas.
dandelion	dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5’ data. It streamlines the pre-processing, leveraging some tools from immcantation suite, and integrates with scanpy/anndata for single-cell BCR/TCR analysis. It also includes a couple of functions for visualization.
decoupler	Python package to infer biological activities from omics data using a collection of methods.
dynamo-release	Inclusive model of expression dynamics with metabolic labeling based scRNA-seq / multiomics, vector field reconstruction, potential landscape mapping, differential geometry analyses, and most probably paths / in silico perturbation predictions.
epiScanpy	EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data.
fava	FAVA uses Variational Autoencoders to infer functional associations from large-scale scRNA-seq (and proteomics) data.
flowsom	The complete FlowSOM package known from R, now available in Python! Analyze high-dimensional cytometry data using FlowSOM, a clustering and visualization algorithm based on a self-organizing map (SOM). FlowSOM is used to distinguish cell populations from cytometry data in an unsupervised way and can help to gain deeper insights in fields such as immunology and oncology.
gssnng	Single-cell gene set scoring with nearest neighbor graph smoothed data.
hotspot	Hotspot is a tool for identifying informative genes (and gene modules) in a single-cell dataset.
infercnvpy	Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.
liana	Python package to infer cell-cell communication events from omics data using a collection of methods.
maxspin	An information theoretic approach to detecting spatially varying genes
moscot	moscot is a scalable toolbox for multiomics single-cell optimal transport applications.
omicverse	OmicVerse is the fundamental package for multi omics included bulk and single cell analysis with Python. The original name of the omicverse was Pyomic, but we wanted to address a whole universe of transcriptomics, so we changed the name to OmicVerse, it aimed to solve all task in RNA-seq.
Palantir	Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq.
Panpipes	A pipeline for multiomic single-cell and spatial transcriptomic data analysis
pegasus	Pegasus is a tool for analyzing transcriptomes of millions of single cells.
pertpy	pertpy is a framework for the analysis of multi-condition omics data.
pyLemur	Python implementation of the LEMUR algorithm for analyzing multi-condition single-cell RNA-seq data..
pySCENIC	pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
pychromVAR	A python pacakge for chromVAR.
pytximport	A Python port of the `tximport` R package for importing transcript-level quantification data from various RNA-seq quantification tools such as `salmon` and `kallisto` and summarizing it to the gene level.
rapids-singlecell	A GPU acclerated python package for singlecell data analysis
scFates	A scalable python package for tree inference and advanced pseudotime analysis from scRNAseq data.
scGen	scGen is a generative model to predict single-cell perturbation response across cell types, studies and species.
scanpro	robust cell proportion analysis for single cell data
schist	schist applies Stochastic Block Models (SBM) to the analysis of single cell data, in particular to identify cell populations
scib	Evaluating single-cell data integration methods
scTriangulate	Python package to mix-and-match conflicting clustering results in single cell analysis and generate reconciled clustering solutions
scVelo	scVelo is a scalable toolkit for RNA velocity analysis in single cells, based on Bergen et al., Nature Biotech, 2020.
scyan	Biology-driven deep generative model for cell-type annotation in cytometry. Scyan is an interpretable model that also corrects batch-effect and can be used for debarcoding or population discovery.
sfaira	sfaira is a model and a data repository in a single python package.
sift-sc	SiFT is a computational framework which aims to uncover the underlying structure by filtering out previously exposed biological signals. SiFT can be applied to a wide range of tasks, from (i) the removal of unwanted variation as a pre-processing step, through (ii) revealing hidden biological structure by utilizing prior knowledge with respect to existing signal, to (iii) uncovering trajectories of interest using reference data to remove unwanted variation.
Sobolev Alignment	Sobolev alignment of deep probabilistic models for comparing single cell profiles from pre-clinical models and patients
sopa	Technology-invariant pipeline for spatial-omics analysis that scales to millions of cells. It includes segmentation, annotation, spatial statistics, and efficient visualization.
spatial-eggplant	Python package designed to transfer information from multiple spatial-transcriptomics data sets to a single reference representing a Common Coordinate Framework (CCF).
Symphonypy	Symphonypy is a pure Python port of Symphony label transfer algorithm for reference-based cell type annotation.
tangram	Spatial alignment and gene expression mapping of single cell transcriptomic data.
vitessce	Vitessce consists of reusable interactive views including a scatterplot, spatial+imaging plot, genome browser tracks, statistical plots, and control views, built on web technologies such as WebGL.