scirpy.tl.ir_query_annotate

scirpy.tl.ir_query_annotate(adata, reference, *, sequence='aa', metric='identity', strategy='unique-only', include_ref_cols=None, query_key=None, suffix='', inplace=True, airr_mod='airr', airr_mod_ref='airr')

Annotate cells based on the result of ir_query().

Warning

This is an experimental function that may change in the future.

Multiple entries from the reference can match a single cell in the query dataset. In order to reduce the matching entries to a single value that can be added to adata.obs and used for plotting and other downstream analyses, you’ll need to choose a strategy to deal with duplicates:

unique-only: Only annotate those cells that have a unique result. Cells with multiple inconsistent matches will receive the predicate “ambiguous”

most-frequent: if there are multiple matches, assign the match that is most frequent. If there are ties, it will receive the predicate “ambiguous”

json: store multiple values and their counts as json string

NA values are ignored in all strategies (e.g. if an entry matches "foo" and nan, "foo" is considered unique)

Alternatively, you can use scirpy.tl.ir_query_annotate_df() to obtain a data frame mapping all cells to their matching entries from reference.obs.

Parameters

adata : AnnData | MuData | DataHandlerUnion[AnnData, MuData, DataHandler]: query dataset
reference : AnnData | MuData | DataHandlerUnion[AnnData, MuData, DataHandler]: reference dataset in anndata format. Must be the same used to run query_reference.
sequence : {‘aa’, ‘nt’}Literal[‘aa’, ‘nt’] (default: 'aa'): The sequence parameter used when running scirpy.pp.ir_dist()
metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} | DistanceCalculatorUnion[Literal[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’], DistanceCalculator] (default: 'identity'): The metric parameter used when running scirpy.pp.ir_dist()
strategy : {‘json’, ‘unique-only’, ‘most-frequent’}Literal[‘json’, ‘unique-only’, ‘most-frequent’] (default: 'unique-only'): Strategy to deal with non-unique values (see above).
include_ref_cols : Sequence[str] | NoneOptional[Sequence[str]] (default: None): Subset the reference database to these columns. Default: include all.
query_key : str | NoneOptional[str] (default: None): Use the distance matric stored under this key in adata.uns. If set to None, the key is automatically inferred based on reference, sequence, and metric. Additional arguments are passed to the last join.
suffix : str (default: ''): Removed in v0.13. Has no effect.
inplace: If True, a column with the result will be stored in obs. Otherwise the result will be returned.
airr_mod : str (default: 'airr'): Name of the modality with AIRR information is stored in the MuData object. if an AnnData object is passed to the function, this parameter is ignored.
airr_mod_ref : str (default: 'airr'): Like airr_mod, but for reference.

Return type

DataFrame | NoneOptional[DataFrame]

Returns

If inplace is True, modifies adata.obs inplace. Otherwise returns a data-frame with one column for each column in reference.obs, aligned to adata.obs_names.