scirpy.tl.ir_query_annotate
- scirpy.tl.ir_query_annotate(adata, reference, *, sequence='aa', metric='identity', strategy='unique-only', include_ref_cols=None, query_key=None, suffix='', inplace=True)
Annotate cells based on the result of
ir_query()
.Warning
This is an experimental function that may change in the future.
Multiple entries from the reference can match a single cell in the query dataset. In order to reduce the matching entries to a single value that can be added to
adata.obs
and used for plotting and other downstream analyses, you’ll need to choose a strategy to deal with duplicates:unique-only: Only annotate those cells that have a unique result. Cells with multiple inconsistent matches will receive the predicate “ambiguous”
most-frequent: if there are multiple matches, assign the match that is most frequent. If there are ties, it will receive the predicate “ambiguous”
json: store multiple values and their counts as json string
NA values are ignored in all strategies (e.g. if an entry matches
"foo"
andnan
,"foo"
is considered unique)Alternatively, you can use
scirpy.tl.ir_query_annotate_df()
to obtain a data frame mapping all cells to their matching entries fromreference.obs
.- Parameters
- adata :
AnnData
query dataset
- reference :
AnnData
reference dataset in anndata format. Must be the same used to run
query_reference
.- sequence : {‘aa’, ‘nt’}
Literal
[‘aa’, ‘nt’] (default:'aa'
) The sequence parameter used when running
scirpy.pp.ir_dist()
- metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} |
DistanceCalculator
Union
[Literal
[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’],DistanceCalculator
] (default:'identity'
) The metric parameter used when running
scirpy.pp.ir_dist()
- strategy : {‘json’, ‘unique-only’, ‘most-frequent’}
Literal
[‘json’, ‘unique-only’, ‘most-frequent’] (default:'unique-only'
) Strategy to deal with non-unique values (see above).
- include_ref_cols :
Sequence
[str
] |None
Optional
[Sequence
[str
]] (default:None
) Subset the reference database to these columns. Default: include all.
- query_key :
str
|None
Optional
[str
] (default:None
) Use the distance matric stored under this key in
adata.uns
. If set to None, the key is automatically inferred based onreference
,sequence
, andmetric
. Additional arguments are passed to the last join.- suffix :
str
(default:''
) Suffix appended to columns from
reference.obs
in case their names are conflicting with those inadata.obs
.
- adata :
- Return type
- Returns
If inplace is True, modifies
adata.obs
inplace. Otherwise returns a data-frame with one column for each column inreference.obs
, aligned toadata.obs_names
.