scirpy.io.read_10x_vdj

scirpy.io.read_10x_vdj(path, filtered=True, include_fields=('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count', 'is_cell', 'high_confidence'))

Read IR data from 10x Genomics cell-ranger output.

Supports all_contig_annotations.json and {all,filtered}_contig_annotations.csv.

If the json file is available, it is preferable as it contains additional information about V(D)J-junction insertions. Other than that there should be no difference.

Note

Reading data into Scirpy has the following constraints:
  • Each cell can have up to four productive chains chains (Dual IR): two VJ and two VDJ chains.

  • Excess chains are ignored (those with lowest read count/UMI count) and cells flagged as Multichain-cell.

  • Non-productive chains are ignored.

  • Chain loci must be valid IMGT locus names.

  • Excess chains, non-productive chains, chains without a CDR3 sequence, or chains with invalid loci are serialized to JSON and stored in the extra_chains column. They are not used by scirpy except when exporting the AnnData object to AIRR format.

For more information, see Immune receptor (IR) model.

Parameters
path : str | PathUnion[str, Path]

Path to filterd_contig_annotations.csv, all_contig_annotations.csv or all_contig_annotations.json.

filtered : bool (default: True)

Only keep filtered contig annotations (i.e. is_cell and high_confidence). If using filtered_contig_annotations.csv already, this option is futile.

include_fields : Collection[str] | NoneOptional[Collection[str]] (default: ('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count', 'is_cell', 'high_confidence'))

The fields to include in adata. The AIRR rearrangment schema contains can contain a lot of columns, most of which irrelevant for most analyses. Per default, this includes a subset of columns relevant for a typical scirpy analysis, to keep adata.obs a bit cleaner. Defaults to (‘productive’, ‘locus’, ‘v_call’, ‘d_call’, ‘j_call’, ‘c_call’, ‘junction’, ‘junction_aa’, ‘consensus_count’, ‘duplicate_count’, ‘is_cell’, ‘high_confidence’). Set this to None to include all columns.

Return type

AnnData

Returns

AnnData object with IR data in obs for each cell. For more details see Data structure.