- scirpy.io.read_10x_vdj(path, filtered=True, include_fields=('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count', 'is_cell', 'high_confidence'))
Read IR data from 10x Genomics cell-ranger output.
jsonfile is available, it is preferable as it contains additional information about V(D)J-junction insertions. Other than that there should be no difference.
- Reading data into Scirpy has the following constraints:
Each cell can have up to four productive chains chains (Dual IR): two VJ and two VDJ chains.
Excess chains are ignored (those with lowest read count/UMI count) and cells flagged as Multichain-cell.
Non-productive chains are ignored.
Chain loci must be valid IMGT locus names.
Excess chains, non-productive chains, chains without a CDR3 sequence, or chains with invalid loci are serialized to JSON and stored in the
extra_chainscolumn. They are not used by scirpy except when exporting the
AnnDataobject to AIRR format.
For more information, see Immune receptor (IR) model.
- path :
- filtered :
Only keep filtered contig annotations (i.e.
high_confidence). If using
filtered_contig_annotations.csvalready, this option is futile.
- include_fields :
('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count', 'is_cell', 'high_confidence'))
The fields to include in
adata. The AIRR rearrangment schema contains can contain a lot of columns, most of which irrelevant for most analyses. Per default, this includes a subset of columns relevant for a typical scirpy analysis, to keep
adata.obsa bit cleaner. Defaults to (‘productive’, ‘locus’, ‘v_call’, ‘d_call’, ‘j_call’, ‘c_call’, ‘junction’, ‘junction_aa’, ‘consensus_count’, ‘duplicate_count’, ‘is_cell’, ‘high_confidence’). Set this to
Noneto include all columns.
- path :
- Return type
AnnData object with IR data in
obsfor each cell. For more details see Data structure.