Packages maintained by core team
These packages are considered foundational in that many other packages build upon them. Joint maintenance by the core team guarantees long-term stability.Data structures
Data structures are the foundational building block for all scverse packages. Building upon common data structures ensures interoperability.anndata
AnnData is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.
mudata
MuData is a format for annotated multimodal datasets where each modality is represented by an AnnData object. MuData’s reference implementation is in Python, and the cross-language functionality is achieved via HDF5-based .h5mu files with libraries in R and Julia.
spatialdata
SpatialData is a data framework that comprises a FAIR storage format and a collection of python libraries for performant access, alignment, and processing of uni- and multi-modal spatial omics datasets. This repository contains the core spatialdata library. See the links below to learn more about other packages in the SpatialData ecosystem.
Analysis task-specific extensions
In addition to these packages, we define standards on how to represent certain data types in these data structures. For now, such a specification is available for Adaptive Immune Receptor Repertoire (AIRR) data.
Frameworks
Frameworks provide essential algorithms and plotting functions for specific analysis steps, building on our data structures.scanpy
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
muon
muon is a Python framework for multimodal omics analysis. While there are many features that muon brings to the table, there are three key areas that its functionality is focused on.
squidpy
Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.
scvi-tools
scvi-tools is a library for developing and deploying machine learning models based on PyTorch and AnnData. With an emphasis on probablistic models, scvi-tools steamlines the development process via training, data management, and user interface abstractions. scvi-tools also contains easy-to-use implementations of more than 14 state-of-the-art probabilistic models in the field.
scirpy
Scirpy is a scalable toolkit to analyse T-cell receptor or B-cell receptor repertoires from single-cell RNA sequencing data. It seamlessly integrates with scanpy and provides various modules for data import, analysis and visualization.
SnapATAC2
SnapATAC2 is a scalable and modular pipeline for analyzing single-cell ATAC-seq data, enabling efficient preprocessing, dimensionality reduction, clustering, and integration with single-cell RNA-seq.
rapids-singlecell
rapids-singlecell is a GPU-accelerated single-cell analysis library that serves as a drop-in replacement for scanpy, squidpy, and decoupler.
pertpy
Pertpy is a framework for analyzing large-scale single-cell perturbation experiments. It harmonizes datasets, automates metadata annotation, calculates perturbation distances, and analyzes cellular responses to genetic modifications, drugs, and environmental changes.
decoupler
decoupler is a framework containing different enrichment statistical methods to extract biologically driven scores from omics data within a unified framework.
Ecosystem packages maintained by scverse community
Many popular packages rely on scverse functionality. For instance, they take advantage of established data format standards such as AnnData and MuData, or are designed to be integrated into the workflow of analysis frameworks. Here, we list ecosystem packages following development best practices (continuous testing, documented, available through standard distribution tools).
This listing is a work in progress. See scverse/ecosystem-packages for inclusion criteria, and to submit more packages.
Package | Description |
---|---|
CellAnnotator | CellAnnotator is a leightweight tool to query large language models for cell type labels in scRNA-seq data. It can incorporate prior knowledge, and it creates consistent labels across samples in your study. |
CellCharter | CellCharter is a framework to identify, characterize and compare spatial domains from spatial omics and multi-omics data. |
CellMapper | CellMapper is a leightweight tool to transfer labels, expression values and embeddings from reference to query datasets using k-NN mapping. It’s fast and versatile, applicable to mapping scenarios in space, across modalities, or from an atlas to a new query dataset. |
CellOracle | A computational tool that integrates single-cell transcriptome and epigenome profiles to infer gene regulatory networks (GRNs), critical regulators of cell identity. |
CellRank | CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules - kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these. |
Cell_BLAST | Cell BLAST is a cell querying tool for single-cell transcriptomics data. |
CellphoneDB | CellphoneDB is a publicly available repository of HUMAN curated receptors, ligands and their interactions paired with a tool to interrogate your own single-cell transcriptomics data (or even bulk transcriptomics data if your samples represent pure populations!). A distinctive feature of CellphoneDB is that the subunit architecture of either ligands and receptors is taken into account, representing heteromeric complexes accurately. This is crucial, as cell communication relies on multi-subunit protein complexes that go beyond the binary representation used in most databases and studies. CellphoneDB also incorporates biosynthetic pathways in which we use the last representative enzyme as a proxy of ligand abundance, by doing so, we include interactions involving non-peptidic molecules. CellphoneDB includes only manually curated and reviewed molecular interactions with evidenced role in cellular communication. |
Cirrocumulus | Cirrocumulus is an interactive visualization tool for large-scale single-cell genomics data. |
DoubletDetection | DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices. |
GPTBioInsightor | GPTBioInsightor is a tool designed for single-cell data analysis, particularly beneficial for newcomers to a biological field or those in interdisciplinary areas who may lack sufficient biological background knowledge. GPTBioInsightor utilizes the powerful capabilities of large language models to help people quickly gain knowledge and insight, enhancing their work efficiency. |
GRnnData | An onverload of anndata to more easily work with gene networks. Allows easy conversion between anndata and grnndata and provide loads of usefull utilities functions. |
LazySlide | LazySlide is a Python library for processing whole slide images (WSI) analysis. It provides a simple interface to perform robust preprocessing and advanced analysis for WSI. |
Mowgli | Paired single-cell multi-omics data integration with Optimal Transport-flavored Nonnegative Matrix Factorization |
Multivelo | A mechanistic model of gene expression that extends the popular RNA velocity framework by incorporating epigenomic data. |
PEAKQC | periodicity evaluation in scATAC-seq data for quality assessment |
PILOT | PILOT is a Python library for Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport. |
PathML | An open-source toolkit for computational pathology and machine learning. |
PyDESeq2 | PyDESeq2 is a python package for bulk RNA-seq differential expression analysis. It is a re-implementation from scratch of the main features of the R package DESeq2 (Love et al. 2014). |
Rectangle | Rectangle is a python package for computational deconvolution. Rectangle presents a novel approach to second-generation deconvolution, characterized by hierarchical processing, an estimation of unknown cellular content and a significant reduction in data volume during signature matrix computation. |
SC2Spa | SC2Spa is a deep learning-based tool for predicting the spatial coordinates of single cells based on transcriptome. Two paired single cell and spatial transcriptomic datasets are required to run SC2Spa. SC2Spa is trained on a ST reference dataset to learn the relationship of gene expression and spatial coordinates. The trained fully-connected neural network can be used to predict the locations of a single cell with only the transcriptomic profile as input. The predicted locations of single cells can be further used to study the communication of the single cells. |
SCALEX | SCALEX is an integration and projection tool for atlas-level single-cell RNA-seq and ATAC-seq data. |
anndata for R | A ‘reticulate’ wrapper for the Python package ‘anndata’. Provides a scalable way of keeping track of data and learned annotations. Used to read from and write to the h5ad file format. |
annsel | Annsel is a user-friendly library that brings familiar dataframe-style operations to AnnData objects such as selection, filtering and group by’s. |
benGRN | Benchmarking tool for gene network inference from single cell RNAseq methods. It uses the grnndata/anndata modality and only contains biological ground truth networks |
bento-tools | A Python toolkit for subcellular analysis of spatial transcriptomics data |
biolord | biolord (biological representation disentanglement) is a deep generative framework for disentangling known and unknown attributes in single-cell data. |
cell2location | Cell2location is a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single-cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. |
cellxgene | CZ CELLxGENE Annotate (pronounced “cell-by-gene”) is an interactive data explorer for single-cell datasets, such as those coming from the Human Cell Atlas. |
dandelion | dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5’ data. It streamlines the pre-processing, leveraging some tools from immcantation suite, and integrates with scanpy/anndata for single-cell BCR/TCR analysis. It also includes a couple of functions for visualization. |
dynamo-release | Inclusive model of expression dynamics with metabolic labeling based scRNA-seq / multiomics, vector field reconstruction, potential landscape mapping, differential geometry analyses, and most probably paths / in silico perturbation predictions. |
epiScanpy | EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data. |
eschr | ESCHR is an ensemble clustering method that provides hard clustering along with uncertainty scores and soft clustering outputs for enhanced interpretability. |
fava | FAVA uses Variational Autoencoders to infer functional associations from large-scale scRNA-seq (and proteomics) data. |
flowsom | The complete FlowSOM package known from R, now available in Python! Analyze high-dimensional cytometry data using FlowSOM, a clustering and visualization algorithm based on a self-organizing map (SOM). FlowSOM is used to distinguish cell populations from cytometry data in an unsupervised way and can help to gain deeper insights in fields such as immunology and oncology. |
gssnng | Single-cell gene set scoring with nearest neighbor graph smoothed data. |
hotspot | Hotspot is a tool for identifying informative genes (and gene modules) in a single-cell dataset. |
infercnvpy | Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy. |
liana | Python package to infer cell-cell communication events from omics data using a collection of methods. |
maxspin | An information theoretic approach to detecting spatially varying genes |
moscot | moscot is a scalable toolbox for multiomics single-cell optimal transport applications. |
novae | Graph-based foundation model for spatial transcriptomics data. Zero-shot spatial domain inference, batch-effect correction, and many other features. |
omicverse | OmicVerse is the fundamental package for multi omics included bulk and single cell analysis with Python. The original name of the omicverse was Pyomic, but we wanted to address a whole universe of transcriptomics, so we changed the name to OmicVerse, it aimed to solve all task in RNA-seq. |
Palantir | Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq. |
Panpipes | A pipeline for multiomic single-cell and spatial transcriptomic data analysis |
pegasus | Pegasus is a tool for analyzing transcriptomes of millions of single cells. |
popV | p(opular)V(oting) is a consensus tool for transfering labels from an annotated reference dataset to an unannotated query dataset. Consensus calling allows interpretable scores that quantify certainty. |
pyLemur | Python implementation of the LEMUR algorithm for analyzing multi-condition single-cell RNA-seq data.. |
pySCENIC | pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data. |
pychromVAR | A python pacakge for chromVAR. |
pytximport | A Python port of the tximport R package for importing transcript-level quantification data from various RNA-seq quantification tools such as salmon and kallisto and summarizing it to the gene level. |
scCellFie | scCellFie infers metabolic activities from single-cell and spatial transcriptomics and offers a variety of downstream analyses. |
scDataLoader | A dataloader for large single cell databases like cellxgene. Does weighted random sampling, downloading and preprocessing. works with anndata, zarr, and h5ad files. |
scFates | A scalable python package for tree inference and advanced pseudotime analysis from scRNAseq data. |
scGen | scGen is a generative model to predict single-cell perturbation response across cell types, studies and species. |
scLiTr | scLiTr (single-cell Lineage Tracing) is a python package for exploratory analysis of barcoding-based scRNA-Seq lineage tracing experiments |
scPRINT | A single cell foundation model for Gene network inference and more… |
scanpro | robust cell proportion analysis for single cell data |
schist | schist applies Stochastic Block Models (SBM) to the analysis of single cell data, in particular to identify cell populations |
scib | Evaluating single-cell data integration methods |
scmcp | A MCP server hub for scRNA-Seq analysis software. |
scTriangulate | Python package to mix-and-match conflicting clustering results in single cell analysis and generate reconciled clustering solutions |
scVelo | scVelo is a scalable toolkit for RNA velocity analysis in single cells, based on Bergen et al., Nature Biotech, 2020. |
scyan | Biology-driven deep generative model for cell-type annotation in cytometry. Scyan is an interpretable model that also corrects batch-effect and can be used for debarcoding or population discovery. |
sfaira | sfaira is a model and a data repository in a single python package. |
sift-sc | SiFT is a computational framework which aims to uncover the underlying structure by filtering out previously exposed biological signals. SiFT can be applied to a wide range of tasks, from (i) the removal of unwanted variation as a pre-processing step, through (ii) revealing hidden biological structure by utilizing prior knowledge with respect to existing signal, to (iii) uncovering trajectories of interest using reference data to remove unwanted variation. |
Sobolev Alignment | Sobolev alignment of deep probabilistic models for comparing single cell profiles from pre-clinical models and patients |
sopa | Technology-invariant pipeline for spatial-omics analysis that scales to millions of cells. It includes segmentation, annotation, spatial statistics, and efficient visualization. |
spatial-eggplant | Python package designed to transfer information from multiple spatial-transcriptomics data sets to a single reference representing a Common Coordinate Framework (CCF). |
spatialproteomics | Spatialproteomics is an interoperable toolbox for analyzing highly multiplexed fluorescence image data. This analysis involves a sequence of steps, including segmentation, image processing, marker quantification, cell type classification, and neighborhood analysis. |
Symphonypy | Symphonypy is a pure Python port of Symphony label transfer algorithm for reference-based cell type annotation. |
tangram | Spatial alignment and gene expression mapping of single cell transcriptomic data. |
vitessce | Vitessce is an integrative visualization framework for multimodal and 2D/3D spatially resolved single-cell data. It consists of reusable, interactive linked views including scatterplot embedding views, 2D/3D spatial image views, genome browser tracks, statistical plots, and control views, built on web technologies such as WebGL and WebXR. |
wsidata | wsidata is a data-structure for efficient IO of Whole Slide Images based on spatialdata. |