, reference, *, sequence='aa', metric='identity', strategy='unique-only', include_ref_cols=None, query_key=None, suffix='', inplace=True)

Annotate cells based on the result of ir_query().


This is an experimental function that may change in the future.

Multiple entries from the reference can match a single cell in the query dataset. In order to reduce the matching entries to a single value that can be added to adata.obs and used for plotting and other downstream analyses, you’ll need to choose a strategy to deal with duplicates:

  • unique-only: Only annotate those cells that have a unique result. Cells with multiple inconsistent matches will receive the predicate “ambiguous”

  • most-frequent: if there are multiple matches, assign the match that is most frequent. If there are ties, it will receive the predicate “ambiguous”

  • json: store multiple values and their counts as json string

NA values are ignored in all strategies (e.g. if an entry matches "foo" and nan, "foo" is considered unique)

Alternatively, you can use to obtain a data frame mapping all cells to their matching entries from reference.obs.

adata : AnnData

query dataset

reference : AnnData

reference dataset in anndata format. Must be the same used to run query_reference.

sequence : {‘aa’, ‘nt’}Literal[‘aa’, ‘nt’] (default: 'aa')

The sequence parameter used when running scirpy.pp.ir_dist()

metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} | DistanceCalculatorUnion[Literal[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’], DistanceCalculator] (default: 'identity')

The metric parameter used when running scirpy.pp.ir_dist()

strategy : {‘json’, ‘unique-only’, ‘most-frequent’}Literal[‘json’, ‘unique-only’, ‘most-frequent’] (default: 'unique-only')

Strategy to deal with non-unique values (see above).

include_ref_cols : Sequence[str] | NoneOptional[Sequence[str]] (default: None)

Subset the reference database to these columns. Default: include all.

query_key : str | NoneOptional[str] (default: None)

Use the distance matric stored under this key in adata.uns. If set to None, the key is automatically inferred based on reference, sequence, and metric. Additional arguments are passed to the last join.

suffix : str (default: '')

Suffix appended to columns from reference.obs in case their names are conflicting with those in adata.obs.

Return type

DataFrame | NoneOptional[DataFrame]


If inplace is True, modifies adata.obs inplace. Otherwise returns a data-frame with one column for each column in reference.obs, aligned to adata.obs_names.