Usage principles

Import scirpy as

import scanpy as sc
import scirpy as ir

Workflow

Scirpy is an extension to Scanpy and adheres to its workflow principles:

  • The API is divided into preprocessing (pp), tools (tl), and plotting (pl).

  • All functions work on AnnData objects.

  • The AnnData instance is modified inplace, unless the functions is called with the keyword argument inplace=False.

We decided to handle a few minor points differently to Scanpy:

Data structure

For instructions how to load data into scirpy, see Loading adaptive Immune Receptor (IR)-sequencing data with Scirpy.

Scirpy leverages the AnnData data structure which combines a gene expression matrix (.X), gene-level annotations (.var) and cell-level annotations (.obs) into a single object. AnnData forms the basis for the Scanpy analysis workflow for single-cell transcriptomics data.

_images/anndata.svg

Image by F. Alex Wolf.

Scirpy adds the following IR-related columns to AnnData.obs:

  • IR_VJ_1_<attr>/IR_VJ_2_<attr>: columns related to the primary and secondary VJ-chain of a receptor (TRA, TRG, IGK, or IGL)

  • IR_VDJ_1_<attr>/IR_VDJ_2_<attr>: columns related to the primary and secondary VDJ-chain of a receptor (TRB, TRD, or IGH)

  • has_ir: True for all cells with an adaptive immune receptor

  • extra_chains: Contains non-productive chains (if not filtered out), and extra chains that do not fit into the 2 VJ + 2 VDJ chain model encoded as JSON. Scirpy does not use this information except for writing it back to AIRR format using scirpy.io.write_airr().

  • multi_chain: True for all cells with more than two productive VJ cells or two or more productive VDJ cells.

Where <attr> can be any field of the AIRR Rearrangement Schema. For Scirpy the following fields are relevant:

  • locus: The IGMT locus name of the chain (TRA, IGH, etc.)

  • c_call, v_call, d_call, j_call: The gene symbols of the respective genes

  • junction_aa and junction: The amino acid and nucleotide sequences of the CDR3 regions