Usage principles

Import scirpy as

import scanpy as sc
import scirpy as ir

Workflow

Scirpy is an extension to Scanpy and adheres to its workflow principles:

The API is divided into preprocessing (pp), tools (tl), and plotting (pl).

All functions work on AnnData objects.

The AnnData instance is modified inplace, unless a function is called with the keyword argument inplace=False.

We decided to handle a few minor points differently to Scanpy:

Plotting functions with inexpensive computations (e.g. scirpy.pl.clonal_expansion()) call the corresponding tool (scirpy.tl.clonal_expansion()) on-the-fly and don’t store the results in the AnnData object.

All plotting functions, by default, return a Axes object, or a list of such.

Data structure

For instructions how to load data into scirpy, see Loading adaptive Immune Receptor (IR)-sequencing data with Scirpy.

Scirpy leverages the AnnData data structure which combines a gene expression matrix (.X), gene-level annotations (.var) and cell-level annotations (.obs) into a single object. AnnData forms the basis for the Scanpy analysis workflow for single-cell transcriptomics data.

_images/anndata.svg — Image by F. Alex Wolf.

Scirpy adds the following IR-related columns to AnnData.obs:

IR_VJ_1_<attr>/IR_VJ_2_<attr>: columns related to the primary and secondary VJ-chain of a receptor (TRA, TRG, IGK, or IGL)

IR_VDJ_1_<attr>/IR_VDJ_2_<attr>: columns related to the primary and secondary VDJ-chain of a receptor (TRB, TRD, or IGH)

has_ir: True for all cells with an adaptive immune receptor

extra_chains: Contains non-productive chains (if not filtered out), and extra chains that do not fit into the 2 VJ + 2 VDJ chain model encoded as JSON. Scirpy does not use this information except for writing it back to AIRR format using scirpy.io.write_airr().

multi_chain: True for all cells with more than two productive VJ cells or two or more productive VDJ cells.

Where <attr> can be any field of the AIRR Rearrangement Schema. For Scirpy the following fields are relevant:

locus: The IMGT locus name of the chain (TRA, IGH, etc.)

c_call, v_call, d_call, j_call: The gene symbols of the respective genes

junction_aa and junction: The amino acid and nucleotide sequences of the CDR3 regions