Import scirpy together with scanpy as

import scanpy as sc
import scirpy as ir

For consistency, the scirpy API tries to follow the scanpy API as closely as possible.

Input/Output: io


In scirpy v0.7.0 the way VDJ data is stored in adata.obs has changed to be fully compliant with the AIRR Rearrangement schema. Please use upgrade_schema() to make AnnData objects from previous scirpy versions compatible with the most recent scirpy workflow.


Update older versions of a scirpy anndata object to the latest schema.

The following functions allow to import V(D)J information from various formats.

io.read_h5ad(filename[, backed, as_sparse, ...])

Read .h5ad-formatted hdf5 file.

io.read_10x_vdj(path[, filtered, include_fields])

Read IR data from 10x Genomics cell-ranger output.


Read data from TraCeR ([SLonnbergP+16]).


Read data from BraCeR ([LEM+18]).

io.read_bd_rhapsody(path[, dominant])

Read IR data from the BD Rhapsody Analysis Pipeline.

io.read_airr(path[, use_umi_count_col, ...])

Read data from AIRR rearrangement format.

io.from_dandelion(dandelion[, transfer])

Import data from Dandelion ([SRB+21]).

Scirpy can export data to the following formats:

io.write_airr(adata, filename)

Export IR data to AIRR Rearrangement tsv format.


Export data to Dandelion ([SRB+21]).

To convert own formats into the scirpy Data structure, we recommend building a list of AirrCell objects first, and then converting them into an AnnData object using from_airr_cells(). For more details, check the Data loading tutorial.

io.AirrCell(cell_id[, ...])

Data structure for a Cell with immune receptors.

io.from_airr_cells(airr_cells[, include_fields])

Convert a collection of AirrCell objects to AnnData.


Convert an adata object with IR information back to a list of AirrCell objects.

Preprocessing: pp

pp.merge_with_ir(adata, adata_ir[, on])

Merge adaptive immune receptor (IR) data with transcriptomics data into a single AnnData object.

pp.merge_airr_chains(adata, adata2)

Merge two AnnData objects with IR information (e.g.

pp.ir_dist(adata[, reference, metric, ...])

Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.

Tools: tl

Tools add an interpretable annotation to the AnnData object which usually can be visualized by a corresponding plotting function.


tl.group_abundance(adata, groupby[, ...])

Summarizes the number/fraction of cells of a certain category by a certain group.

Quality control

tl.chain_qc(adata, *[, inplace, key_added])

Perform quality control based on the receptor-chain pairing configuration.

Define and visualize clonotypes

tl.define_clonotypes(adata, *[, key_added, ...])

Define clonotypes based on CDR3 nucleic acid sequence identity.

tl.define_clonotype_clusters(adata, *[, ...])

Define clonotype clusters.

tl.clonotype_convergence(adata, *, ...[, ...])

Finds evidence for Convergent evolution of clonotypes.

tl.clonotype_network(adata, *[, sequence, ...])

Computes the layout of the clonotype network.

tl.clonotype_network_igraph(adata[, basis])

Get an igraph object representing the clonotype network.

Analyse clonal diversity

tl.clonal_expansion(adata, *[, target_col, ...])

Adds a column to obs recording which clonotypes are expanded.

tl.summarize_clonal_expansion(adata, groupby, *)

Summarizes clonal expansion by a grouping variable.

tl.alpha_diversity(adata, groupby, *[, ...])

Computes the alpha diversity of clonotypes within a group.

tl.repertoire_overlap(adata, groupby, *[, ...])

Compute distance between cell groups based on clonotype overlap.

tl.clonotype_modularity(adata[, target_col, ...])

Identifies clonotypes or clonotype clusters consisting of cells that are more transcriptionally related than expected by chance by computing the Clonotype modularity.

tl.clonotype_imbalance(*args, **kwargs)

Query reference databases

tl.ir_query(adata, reference, *[, sequence, ...])

Query a referece database for matching immune cell receptors.

tl.ir_query_annotate(adata, reference, *[, ...])

Annotate cells based on the result of ir_query().

tl.ir_query_annotate_df(adata, reference, *)

Returns the inner join of adata.obs with matching entries from reference.obs based on the result of ir_query().

V(D)J gene usage

tl.spectratype(adata[, groupby, ...])

Summarizes the distribution of CDR3 region lengths.

Plotting: pl


pl.embedding(adata, basis, *[, color, ...])

A customized wrapper to the scanpy.pl.embedding() function.


Every of these plotting functions has a corresponding tool in the scirpy.tl section. Depending on the computational load, tools are either invoked on-the-fly when calling the plotting function or need to be precomputed and stored in AnnData previously.

pl.alpha_diversity(adata, groupby, *[, ...])

Plot the alpha diversity per group.

pl.clonal_expansion(adata, groupby, *[, ...])

Visualize clonal expansion.

pl.group_abundance(adata, groupby[, ...])

Plots the number of cells per group, split up by a categorical variable.

pl.spectratype(adata[, cdr3_col, ...])

Show the distribution of CDR3 region lengths.

pl.vdj_usage(adata, *[, vdj_cols, ...])

Creates a ribbon plot of the most abundant VDJ combinations.

pl.repertoire_overlap(adata, groupby, *[, ...])

Visualizes overlap betwen a pair of samples on a scatter plot or

pl.clonotype_modularity(adata[, ax, ...])

Plots the Clonotype modularity score against the associated log10 p-value.

pl.clonotype_network(adata, *[, color, ...])

Plot the Clonotype network.

pl.clonotype_imbalance(adata, replicate_col, ...)

Aims to find clonotypes that are the most enriched or depleted in a category.

Base plotting functions: pl.base

pl.base.bar(data, *[, ax, stacked, style, ...])

Basic plotting function built on top of bar plot in Pandas.

pl.base.line(data, *[, ax, style, ...])

Basic plotting function built on top of line plot in Pandas.

pl.base.barh(data, *[, ax, style, ...])

Basic plotting function built on top of bar plot in Pandas.

pl.base.curve(data, *[, ax, curve_layout, ...])

Basic plotting function for drawing KDE-smoothed curves.

Plot styling: pl.styling

pl.styling.apply_style_to_axes(ax, style, ...)

Apply a predefined style to an axis object.

pl.styling.style_axes(ax[, title, ...])

Style an axes object.

Datasets: datasets


Return the dataset from [WMdA+20] as AnnData object.


Return the dataset from [WMdA+20] as AnnData object, downsampled to 3000 TCR-containing cells.


Return the dataset from [MMR+20] as AnnData object.

Reference databases

datasets.vdjdb([cached, cache_path])

Download VDJdb and process it into an AnnData object.

datasets.iedb([cached, cache_path])

Download IEBD v3 and process it into an AnnData object.

A reference database is also just a Scirpy-formatted AnnData object. This means you can follow the instructions in the data loading tutorial to build a custom reference database.

Utility functions: util

util.graph.layout_components(graph[, ...])

Compute a graph layout by layouting all connected components individually.

util.graph.layout_fr_size_aware(graph, *[, ...])

Compute the Fruchterman-Reingold layout respecting node sizes.

util.graph.igraph_from_sparse_matrix(matrix, *)

Get an igraph object from an adjacency or distance matrix.

IR distance utilities: ir_dist

ir_dist.sequence_dist(seqs[, seqs2, metric, ...])

Calculate a sequence x sequence distance matrix.

distance metrics


Abstract base class for a CDR3-sequence distance calculator.


Abstract base class for a DistanceCalculator that computes distances in parallel.


Calculates the Identity-distance between CDR3 sequences.


Calculates the Levenshtein edit-distance between sequences.


Calculates the Hamming distance between sequences of identical length.


Calculates distance between sequences based on pairwise sequence alignment.