Core packages
anndata
Anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.
mudata
MuData is a format for annotated multimodal datasets where each modality is represented by an AnnData object. MuData's reference implementation is in Python, and the cross-language functionality is achieved via HDF5-based .h5mu files with libraries in R and Julia.
scanpy
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
muon
muon is a Python framework for multimodal omics analysis. While there are many features that muon brings to the table, there are three key areas that its functionality is focused on.
scvi-tools
scvi-tools is a library for developing and deploying machine learning models based on PyTorch and AnnData. With an emphasis on probablistic models, scvi-tools steamlines the development process via training, data management, and user interface abstractions. scvi-tools also contains easy-to-use implementations of more than 14 state-of-the-art probabilistic models in the field.
scirpy
Scirpy is a scalable toolkit to analyse T-cell receptor or B-cell receptor repertoires from single-cell RNA sequencing data. It seamlessly integrates with scanpy and provides various modules for data import, analysis and visualization.
squidpy
Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.
Ecosystem
Many popular packages rely on scverse functionality. For instance, they take advantage of established data format standards such as AnnData and MuData, or are designed to be integrated into the workflow of analysis frameworks. Here, we list ecosystem packages following development best practices (continuous testing, documented, available through standard distribution tools).
This listing is a work in progress. See scverse/ecosystem-packages for inclusion criteria, and to submit more packages.
Package | Description |
---|---|
CellOracle | A computational tool that integrates single-cell transcriptome and epigenome profiles to infer gene regulatory networks (GRNs), critical regulators of cell identity. |
CellRank | CellRank is a toolkit to uncover cellular dynamics based on Markov state modeling of single-cell data. It contains two main modules - kernels compute cell-cell transition probabilities and estimators generate hypothesis based on these. |
Cell_BLAST | Cell BLAST is a cell querying tool for single-cell transcriptomics data. |
Cirrocumulus | Cirrocumulus is an interactive visualization tool for large-scale single-cell genomics data. |
DoubletDetection | DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices. |
Mowgli | Paired single-cell multi-omics data integration with Optimal Transport-flavored Nonnegative Matrix Factorization |
PathML | An open-source toolkit for computational pathology and machine learning. |
PyDESeq2 | PyDESeq2 is a python package for bulk RNA-seq differential expression analysis. It is a re-implementation from scratch of the main features of the R package DESeq2 (Love et al. 2014). |
SnapATAC2 | SnapATAC2 is the successor of the SnapATAC R package, featuring:
|
anndata for R | A ‘reticulate’ wrapper for the Python package ‘anndata’. Provides a scalable way of keeping track of data and learned annotations. Used to read from and write to the h5ad file format. |
bento-tools | A Python toolkit for subcellular analysis of spatial transcriptomics data |
biolord | biolord (biological representation disentanglement) is a deep generative framework for disentangling known and unknown attributes in single-cell data. |
cell2location | Cell2location is a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single-cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. |
cellxgene | CZ CELLxGENE Annotate (pronounced “cell-by-gene”) is an interactive data explorer for single-cell datasets, such as those coming from the Human Cell Atlas. |
dandelion | dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5’ data. It streamlines the pre-processing, leveraging some tools from immcantation suite, and integrates with scanpy/anndata for single-cell BCR/TCR analysis. It also includes a couple of functions for visualization. |
decoupler | Python package to infer biological activities from omics data using a collection of methods. |
dynamo-release | Inclusive model of expression dynamics with metabolic labeling based scRNA-seq / multiomics, vector field reconstruction, potential landscape mapping, differential geometry analyses, and most probably paths / in silico perturbation predictions. |
epiScanpy | EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data. |
hotspot | Hotspot is a tool for identifying informative genes (and gene modules) in a single-cell dataset. |
infercnvpy | Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy. |
pegasus | Pegasus is a tool for analyzing transcriptomes of millions of single cells. |
pySCENIC | pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data. |
pychromVAR | A python pacakge for chromVAR. |
scFates | A scalable python package for tree inference and advanced pseudotime analysis from scRNAseq data. |
scGen | scGen is a generative model to predict single-cell perturbation response across cell types, studies and species. |
scib | Evaluating single-cell data integration methods |
scTriangulate | Python package to mix-and-match conflicting clustering results in single cell analysis and generate reconciled clustering solutions |
scVelo | scVelo is a scalable toolkit for RNA velocity analysis in single cells, based on Bergen et al., Nature Biotech, 2020. |
sfaira | sfaira is a model and a data repository in a single python package. |
sift-sc | SiFT is a computational framework which aims to uncover the underlying structure by filtering out previously exposed biological signals. SiFT can be applied to a wide range of tasks, from (i) the removal of unwanted variation as a pre-processing step, through (ii) revealing hidden biological structure by utilizing prior knowledge with respect to existing signal, to (iii) uncovering trajectories of interest using reference data to remove unwanted variation. |
Sobolev Alignment | Sobolev alignment of deep probabilistic models for comparing single cell profiles from pre-clinical models and patients |
spatial-eggplant | Python package designed to transfer information from multiple spatial-transcriptomics data sets to a single reference representing a Common Coordinate Framework (CCF). |
Symphonypy | Symphonypy is a pure Python port of Symphony label transfer algorithm for reference-based cell type annotation. |
tangram | Spatial alignment and gene expression mapping of single cell transcriptomic data. |
vitessce | Vitessce consists of reusable interactive views including a scatterplot, spatial+imaging plot, genome browser tracks, statistical plots, and control views, built on web technologies such as WebGL. |