scirpy.io.read_10x_vdj
- scirpy.io.read_10x_vdj(path, filtered=True, include_fields=('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count', 'is_cell', 'high_confidence'))
Read IR data from 10x Genomics cell-ranger output.
Supports
all_contig_annotations.json
and{all,filtered}_contig_annotations.csv
.If the
json
file is available, it is preferable as it contains additional information about V(D)J-junction insertions. Other than that there should be no difference.Note
- Reading data into Scirpy has the following constraints:
Each cell can have up to four productive chains chains (Dual IR): two VJ and two VDJ chains.
Excess chains are ignored (those with lowest read count/UMI count) and cells flagged as Multichain-cell.
Non-productive chains are ignored.
Chain loci must be valid IMGT locus names.
Excess chains, non-productive chains, chains without a CDR3 sequence, or chains with invalid loci are serialized to JSON and stored in the
extra_chains
column. They are not used by scirpy except when exporting theAnnData
object to AIRR format.
For more information, see Immune receptor (IR) model.
- Parameters
- path :
str
|Path
Union
[str
,Path
] Path to
filterd_contig_annotations.csv
,all_contig_annotations.csv
orall_contig_annotations.json
.- filtered :
bool
(default:True
) Only keep filtered contig annotations (i.e.
is_cell
andhigh_confidence
). If usingfiltered_contig_annotations.csv
already, this option is futile.- include_fields :
Collection
[str
] |None
Optional
[Collection
[str
]] (default:('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count', 'is_cell', 'high_confidence')
) The fields to include in
adata
. The AIRR rearrangment schema contains can contain a lot of columns, most of which irrelevant for most analyses. Per default, this includes a subset of columns relevant for a typical scirpy analysis, to keepadata.obs
a bit cleaner. Defaults to (‘productive’, ‘locus’, ‘v_call’, ‘d_call’, ‘j_call’, ‘c_call’, ‘junction’, ‘junction_aa’, ‘consensus_count’, ‘duplicate_count’, ‘is_cell’, ‘high_confidence’). Set this toNone
to include all columns.
- path :
- Return type
- Returns
AnnData object with IR data in
obs
for each cell. For more details see Data structure.