- scirpy.io.read_airr(path, use_umi_count_col='auto', infer_locus=True, cell_attributes=('is_cell', 'high_confidence', 'multi_chain'), include_fields=('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count'))
Read data from AIRR rearrangement format.
- The following columns are required by scirpy:
at least one of
at least one of
Data should still import if one of these fields is missing, but they are required by most of scirpy’s processing functions. All chains for which the field
junction_aais missing or empty, will be considered as non-productive and will be moved to the
- Reading data into Scirpy has the following constraints:
Each cell can have up to four productive chains chains (Dual IR): two VJ and two VDJ chains.
Excess chains are ignored (those with lowest read count/UMI count) and cells flagged as Multichain-cell.
Non-productive chains are ignored.
Chain loci must be valid IMGT locus names.
Excess chains, non-productive chains, chains without a CDR3 sequence, or chains with invalid loci are serialized to JSON and stored in the
extra_chainscolumn. They are not used by scirpy except when exporting the
AnnDataobject to AIRR format.
For more information, see Immune receptor (IR) model.
- path :
Path to the AIRR rearrangement tsv file. If different chains are split up into multiple files, these can be specified as a List, e.g.
["path/to/tcr_alpha.tsv", "path/to/tcr_beta.tsv"]. Alternatively, this can be a pandas data frame.
- use_umi_count_col :
Whether to add UMI counts from the non-strandard (but common)
umi_countcolumn. When this column is used, the UMI counts are moved over to the standard
duplicate_countcolumn. Default: Use
umi_countif there is no
- infer_locus :
Try to infer the
locuscolumn from gene names, in case it is not specified.
- cell_attributes :
('is_cell', 'high_confidence', 'multi_chain'))
Fields in the rearrangement schema that are specific for a cell rather than a chain. The values must be identical over all records belonging to a cell. This defaults to
- include_fields :
('productive', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_aa', 'consensus_count', 'duplicate_count'))
The fields to include in
adata. The AIRR rearrangment schema contains can contain a lot of columns, most of which irrelevant for most analyses. Per default, this includes a subset of columns relevant for a typical scirpy analysis, to keep
adata.obsa bit cleaner. Defaults to
("productive","locus","v_call","d_call","j_call","c_call","junction","junction_aa","consensus_count","duplicate_count"). Set this to
Noneto include all columns.
- path :
- Return type
AnnData object with IR data in
obsfor each cell. For more details see Data structure.