Packages

Packages maintained by core team

These packages are considered foundational in that many other packages build upon them. Joint maintenance by the core team guarantees long-term stability.

Data structures

Data structures are the foundational building block for all scverse packages. Building upon common data structures ensures interoperability.

anndata AnnData is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.

GitHub Documentation PyPI Conda

mudata MuData is a format for annotated multimodal datasets where each modality is represented by an AnnData object. MuData’s reference implementation is in Python, and the cross-language functionality is achieved via HDF5-based .h5mu files with libraries in R and Julia.

GitHub Documentation PyPI Conda Muon.jl

spatialdata SpatialData is a data framework that comprises a FAIR storage format and a collection of python libraries for performant access, alignment, and processing of uni- and multi-modal spatial omics datasets. This repository contains the core spatialdata library. See the links below to learn more about other packages in the SpatialData ecosystem.

GitHub Documentation PyPI spatialdata-io

Analysis task-specific extensions

In addition to these packages, we define standards on how to represent certain data types in these data structures. For now, such a specification is available for Adaptive Immune Receptor Repertoire (AIRR) data.

Frameworks

Frameworks provide essential algorithms and plotting functions for specific analysis steps, building on our data structures.

scanpy Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

GitHub Documentation and tutorials PyPI Conda

muon muon is a Python framework for multimodal omics analysis. While there are many features that muon brings to the table, there are three key areas that its functionality is focused on.

GitHub Documentation Tutorials PyPI Website

squidpy Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

GitHub Documentation and tutorials PyPI

scvi-tools scvi-tools is a library for developing and deploying machine learning models based on PyTorch and AnnData. With an emphasis on probabilistic models, scvi-tools streamlines the development process via training, data management, and user interface abstractions. scvi-tools also contains easy-to-use implementations of more than 14 state-of-the-art probabilistic models in the field.

GitHub Documentation and tutorials PyPI Website

scirpy Scirpy is a scalable toolkit to analyse T-cell receptor or B-cell receptor repertoires from single-cell RNA sequencing data. It seamlessly integrates with scanpy and provides various modules for data import, analysis and visualization.

GitHub Documentation and tutorials PyPI Conda

SnapATAC2 SnapATAC2 is a scalable and modular pipeline for analyzing single-cell ATAC-seq data, enabling efficient preprocessing, dimensionality reduction, clustering, and integration with single-cell RNA-seq.

GitHub Documentation and tutorials PyPI Conda

rapids-singlecell rapids-singlecell is a GPU-accelerated single-cell analysis library that serves as a drop-in replacement for scanpy, squidpy, and decoupler.

GitHub Documentation and tutorials PyPI

pertpy Pertpy is a framework for analyzing large-scale single-cell perturbation experiments. It harmonizes datasets, automates metadata annotation, calculates perturbation distances, and analyzes cellular responses to genetic modifications, drugs, and environmental changes.

GitHub Documentation and tutorials PyPI Conda

decoupler decoupler is a framework containing different enrichment statistical methods to extract biologically driven scores from omics data within a unified framework.

GitHub Documentation and tutorials PyPI Conda

Ecosystem packages maintained by scverse community

Many popular packages rely on scverse functionality. For instance, they take advantage of established data format standards such as AnnData and MuData, or are designed to be integrated into the workflow of analysis frameworks. Here, we list ecosystem packages following development best practices (continuous testing, documented, available through standard distribution tools).

This listing is a work in progress. See scverse/ecosystem-packages for inclusion criteria, and to submit more packages.

Package	Description
CellAnnotator	CellAnnotator is a leightweight tool to query large language models for cell type labels in scRNA-seq data. It can incorporate prior knowledge, and it creates consistent labels across samples in your study.
CellCharter	CellCharter is a framework to identify, characterize and compare spatial domains from spatial omics and multi-omics data.
CellMapper	CellMapper is a leightweight tool to transfer labels, expression values and embeddings from reference to query datasets using k-NN mapping. It’s fast and versatile, applicable to mapping scenarios in space, across modalities, or from an atlas to a new query dataset.
CellOracle	A computational tool that integrates single-cell transcriptome and epigenome profiles to infer gene regulatory networks (GRNs), critical regulators of cell identity.
CellRank	CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules - kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these.
Cell_BLAST	Cell BLAST is a cell querying tool for single-cell transcriptomics data.
CellphoneDB	CellphoneDB is a publicly available repository of HUMAN curated receptors, ligands and their interactions paired with a tool to interrogate your own single-cell transcriptomics data (or even bulk transcriptomics data if your samples represent pure populations!). A distinctive feature of CellphoneDB is that the subunit architecture of either ligands and receptors is taken into account, representing heteromeric complexes accurately. This is crucial, as cell communication relies on multi-subunit protein complexes that go beyond the binary representation used in most databases and studies. CellphoneDB also incorporates biosynthetic pathways in which we use the last representative enzyme as a proxy of ligand abundance, by doing so, we include interactions involving non-peptidic molecules. CellphoneDB includes only manually curated and reviewed molecular interactions with evidenced role in cellular communication.
Cirrocumulus	Cirrocumulus is an interactive visualization tool for large-scale single-cell genomics data.
DoubletDetection	DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.
GPTBioInsightor	GPTBioInsightor is a tool designed for single-cell data analysis, particularly beneficial for newcomers to a biological field or those in interdisciplinary areas who may lack sufficient biological background knowledge. GPTBioInsightor utilizes the powerful capabilities of large language models to help people quickly gain knowledge and insight, enhancing their work efficiency.
GRnnData	An onverload of anndata to more easily work with gene networks. Allows easy conversion between anndata and grnndata and provide loads of usefull utilities functions.
LazySlide	LazySlide is a Python library for processing whole slide images (WSI) analysis. It provides a simple interface to perform robust preprocessing and advanced analysis for WSI.
Mowgli	Paired single-cell multi-omics data integration with Optimal Transport-flavored Nonnegative Matrix Factorization
Multivelo	A mechanistic model of gene expression that extends the popular RNA velocity framework by incorporating epigenomic data.
PEAKQC	periodicity evaluation in scATAC-seq data for quality assessment
PILOT	PILOT is a Python library for Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport.
ParTIpy	Implements Pareto task inference and archetypal analysis for analyzing functional trade-offs in single-cell and spatial omics data.
PathML	An open-source toolkit for computational pathology and machine learning.
PyDESeq2	PyDESeq2 is a python package for bulk RNA-seq differential expression analysis. It is a re-implementation from scratch of the main features of the R package DESeq2 (Love et al. 2014).
Rectangle	Rectangle is a python package for computational deconvolution. Rectangle presents a novel approach to second-generation deconvolution, characterized by hierarchical processing, an estimation of unknown cellular content and a significant reduction in data volume during signature matrix computation.
SC2Spa	SC2Spa is a deep learning-based tool for predicting the spatial coordinates of single cells based on transcriptome. Two paired single cell and spatial transcriptomic datasets are required to run SC2Spa. SC2Spa is trained on a ST reference dataset to learn the relationship of gene expression and spatial coordinates. The trained fully-connected neural network can be used to predict the locations of a single cell with only the transcriptomic profile as input. The predicted locations of single cells can be further used to study the communication of the single cells.
SCALEX	SCALEX is an integration and projection tool for atlas-level single-cell RNA-seq and ATAC-seq data.
anndata for R	A ‘reticulate’ wrapper for the Python package ‘anndata’. Provides a scalable way of keeping track of data and learned annotations. Used to read from and write to the h5ad file format.
annsel	Annsel is a user-friendly library that brings familiar dataframe-style operations to AnnData objects such as selection, filtering and group by’s.
benGRN	Benchmarking tool for gene network inference from single cell RNAseq methods. It uses the grnndata/anndata modality and only contains biological ground truth networks
bento-tools	A Python toolkit for subcellular analysis of spatial transcriptomics data
biolord	biolord (biological representation disentanglement) is a deep generative framework for disentangling known and unknown attributes in single-cell data.
cell2location	Cell2location is a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single-cell and spatial transcriptomics with higher sensitivity and resolution than existing tools.
cellxgene	CZ CELLxGENE Annotate (pronounced “cell-by-gene”) is an interactive data explorer for single-cell datasets, such as those coming from the Human Cell Atlas.
dandelion	dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5’ data. It streamlines the pre-processing, leveraging some tools from immcantation suite, and integrates with scanpy/anndata for single-cell BCR/TCR analysis. It also includes a couple of functions for visualization.
delnx	delnx is a python package for differential expression analysis of (single-cell) genomics data. It enables scalable analyses of atlas-level datasets through GPU-accelerated regression models and statistical tests implemented in JAX and provides a consistent interface to perform DE analysis with other methods, such as statsmodels and PyDESeq2.
dynamo-release	Inclusive model of expression dynamics with metabolic labeling based scRNA-seq / multiomics, vector field reconstruction, potential landscape mapping, differential geometry analyses, and most probably paths / in silico perturbation predictions.
epiScanpy	EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data.
eschr	ESCHR is an ensemble clustering method that provides hard clustering along with uncertainty scores and soft clustering outputs for enhanced interpretability.
fava	FAVA uses Variational Autoencoders to infer functional associations from large-scale scRNA-seq (and proteomics) data.
flowsom	The complete FlowSOM package known from R, now available in Python! Analyze high-dimensional cytometry data using FlowSOM, a clustering and visualization algorithm based on a self-organizing map (SOM). FlowSOM is used to distinguish cell populations from cytometry data in an unsupervised way and can help to gain deeper insights in fields such as immunology and oncology.
gssnng	Single-cell gene set scoring with nearest neighbor graph smoothed data.
hotspot	Hotspot is a tool for identifying informative genes (and gene modules) in a single-cell dataset.
infercnvpy	Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.
liana	Python package to infer cell-cell communication events from omics data using a collection of methods.
maxspin	An information theoretic approach to detecting spatially varying genes
moscot	moscot is a scalable toolbox for multiomics single-cell optimal transport applications.
novae	Graph-based foundation model for spatial transcriptomics data. Zero-shot spatial domain inference, batch-effect correction, and many other features.
omicverse	OmicVerse is the fundamental package for multi omics included bulk and single cell analysis with Python. The original name of the omicverse was Pyomic, but we wanted to address a whole universe of transcriptomics, so we changed the name to OmicVerse, it aimed to solve all task in RNA-seq.
Palantir	Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq.
Panpipes	A pipeline for multiomic single-cell and spatial transcriptomic data analysis
pegasus	Pegasus is a tool for analyzing transcriptomes of millions of single cells.
popV	p(opular)V(oting) is a consensus tool for transfering labels from an annotated reference dataset to an unannotated query dataset. Consensus calling allows interpretable scores that quantify certainty.
pyLemur	Python implementation of the LEMUR algorithm for analyzing multi-condition single-cell RNA-seq data..
pySCENIC	pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
pychromVAR	A python pacakge for chromVAR.
pytximport	A Python port of the `tximport` R package for importing transcript-level quantification data from various RNA-seq quantification tools such as `salmon` and `kallisto` and summarizing it to the gene level.
scCellFie	scCellFie infers metabolic activities from single-cell and spatial transcriptomics and offers a variety of downstream analyses.
scDataLoader	A dataloader for large single cell databases like cellxgene. Does weighted random sampling, downloading and preprocessing. works with anndata, zarr, and h5ad files.
scFates	A scalable python package for tree inference and advanced pseudotime analysis from scRNAseq data.
scGen	scGen is a generative model to predict single-cell perturbation response across cell types, studies and species.
scLiTr	scLiTr (single-cell Lineage Tracing) is a python package for exploratory analysis of barcoding-based scRNA-Seq lineage tracing experiments
scPRINT	A single cell foundation model for Gene network inference and more…
scanpro	robust cell proportion analysis for single cell data
schist	schist applies Stochastic Block Models (SBM) to the analysis of single cell data, in particular to identify cell populations
scib	Evaluating single-cell data integration methods
scmcp	A MCP server hub for scRNA-Seq analysis software.
scTriangulate	Python package to mix-and-match conflicting clustering results in single cell analysis and generate reconciled clustering solutions
scVelo	scVelo is a scalable toolkit for RNA velocity analysis in single cells, based on Bergen et al., Nature Biotech, 2020.
scxmatch	Single-cell Cross Match (scxmatch) is a is a Python package that implements Rosenbaum’s cross-match test using distance-based matching to assess distribution shifts between two groups of high-dimensional data. This is particularly useful in analyzing multivariate distributions in structured data, such as single-cell RNA-seq or ATAC-seq.
scyan	Biology-driven deep generative model for cell-type annotation in cytometry. Scyan is an interpretable model that also corrects batch-effect and can be used for debarcoding or population discovery.
sfaira	sfaira is a model and a data repository in a single python package.
sift-sc	SiFT is a computational framework which aims to uncover the underlying structure by filtering out previously exposed biological signals. SiFT can be applied to a wide range of tasks, from (i) the removal of unwanted variation as a pre-processing step, through (ii) revealing hidden biological structure by utilizing prior knowledge with respect to existing signal, to (iii) uncovering trajectories of interest using reference data to remove unwanted variation.
Sobolev Alignment	Sobolev alignment of deep probabilistic models for comparing single cell profiles from pre-clinical models and patients
sopa	Technology-invariant pipeline for spatial-omics analysis that scales to millions of cells. It includes segmentation, annotation, spatial statistics, and efficient visualization.
spatial-eggplant	Python package designed to transfer information from multiple spatial-transcriptomics data sets to a single reference representing a Common Coordinate Framework (CCF).
spatialproteomics	Spatialproteomics is an interoperable toolbox for analyzing highly multiplexed fluorescence image data. This analysis involves a sequence of steps, including segmentation, image processing, marker quantification, cell type classification, and neighborhood analysis.
Symphonypy	Symphonypy is a pure Python port of Symphony label transfer algorithm for reference-based cell type annotation.
tangram	Spatial alignment and gene expression mapping of single cell transcriptomic data.
vitessce	Vitessce is an integrative visualization framework for multimodal and 2D/3D spatially resolved single-cell data. It consists of reusable, interactive linked views including scatterplot embedding views, 2D/3D spatial image views, genome browser tracks, statistical plots, and control views, built on web technologies such as WebGL and WebXR.
wsidata	wsidata is a data-structure for efficient IO of Whole Slide Images based on spatialdata.