scirpy.tl.ir_query_annotate
- scirpy.tl.ir_query_annotate(adata, reference, *, sequence='aa', metric='identity', strategy='unique-only', include_ref_cols=None, query_key=None, suffix='', inplace=True, airr_mod='airr', airr_mod_ref='airr')
Annotate cells based on the result of
ir_query()
.Warning
This is an experimental function that may change in the future.
Multiple entries from the reference can match a single cell in the query dataset. In order to reduce the matching entries to a single value that can be added to
adata.obs
and used for plotting and other downstream analyses, you’ll need to choose a strategy to deal with duplicates:unique-only: Only annotate those cells that have a unique result. Cells with multiple inconsistent matches will receive the predicate “ambiguous”
most-frequent: if there are multiple matches, assign the match that is most frequent. If there are ties, it will receive the predicate “ambiguous”
json: store multiple values and their counts as json string
NA values are ignored in all strategies (e.g. if an entry matches
"foo"
andnan
,"foo"
is considered unique)Alternatively, you can use
scirpy.tl.ir_query_annotate_df()
to obtain a data frame mapping all cells to their matching entries fromreference.obs
.- Parameters
- adata :
AnnData
|MuData
|DataHandler
Union
[AnnData
,MuData
,DataHandler
] query dataset
- reference :
AnnData
|MuData
|DataHandler
Union
[AnnData
,MuData
,DataHandler
] reference dataset in anndata format. Must be the same used to run
query_reference
.- sequence : {‘aa’, ‘nt’}
Literal
[‘aa’, ‘nt’] (default:'aa'
) The sequence parameter used when running
scirpy.pp.ir_dist()
- metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} |
DistanceCalculator
Union
[Literal
[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’],DistanceCalculator
] (default:'identity'
) The metric parameter used when running
scirpy.pp.ir_dist()
- strategy : {‘json’, ‘unique-only’, ‘most-frequent’}
Literal
[‘json’, ‘unique-only’, ‘most-frequent’] (default:'unique-only'
) Strategy to deal with non-unique values (see above).
- include_ref_cols :
Sequence
[str
] |None
Optional
[Sequence
[str
]] (default:None
) Subset the reference database to these columns. Default: include all.
- query_key :
str
|None
Optional
[str
] (default:None
) Use the distance matric stored under this key in
adata.uns
. If set to None, the key is automatically inferred based onreference
,sequence
, andmetric
. Additional arguments are passed to the last join.- suffix :
str
(default:''
) Removed in v0.13. Has no effect.
- inplace
If
True
, a column with the result will be stored inobs
. Otherwise the result will be returned.- airr_mod :
str
(default:'airr'
) Name of the modality with AIRR information is stored in the
MuData
object. if anAnnData
object is passed to the function, this parameter is ignored.- airr_mod_ref :
str
(default:'airr'
) Like
airr_mod
, but forreference
.
- adata :
- Return type
- Returns
If inplace is True, modifies
adata.obs
inplace. Otherwise returns a data-frame with one column for each column inreference.obs
, aligned toadata.obs_names
.