scirpy.pp.ir_dist

scirpy.pp.ir_dist(adata, reference=None, *, metric='identity', cutoff=None, sequence='nt', key_added=None, inplace=True, n_jobs=None)

Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.

This is a required proprocessing step for clonotype definition and clonotype networks and for querying reference databases.

Calculates the full pairwise distance matrix.

Important

  • Distances are offset by 1 to allow efficient use of sparse matrices (\(d' = d+1\)).

  • That means, a distance > cutoff is represented as 0, a distance == 0 is represented as 1, a distance == 1 is represented as 2 and so on.

  • Only returns distances <= cutoff. Larger distances are eliminated from the sparse matrix.

  • Distances are non-negative.

Parameters
adata : AnnData

annotated data matrix

reference : AnnData | NoneOptional[AnnData] (default: None)

Another AnnData object, can be either a second dataset with IR information or a epitope database. If specified, will compute distances between the sequences in adata and the sequences in reference. Otherwise computes pairwise distances of the sequences in adata.

metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} | DistanceCalculatorUnion[Literal[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’], DistanceCalculator] (default: 'identity')

You can choose one of the following metrics:

cutoff : int | NoneOptional[int] (default: None)

All distances > cutoff will be replaced by 0 and eliminated from the sparse matrix. A sensible cutoff depends on the distance metric, you can find information in the corresponding docs. If set to None, the cutoff will be 10 for the alignment metric, and 2 for levenshtein and hamming. For the identity metric, the cutoff is ignored and always set to 0.

sequence : {‘aa’, ‘nt’}Literal[‘aa’, ‘nt’] (default: 'nt')

Compute distances based on amino acid (aa) or nucleotide (nt) sequences.

key_added : str | NoneOptional[str] (default: None)

Dictionary key under which the results will be stored in adata.uns if inplace=True. Defaults to ir_dist_{sequence}_{metric} or ir_dist_{name}_{sequence}_{metric} if reference is specified. If metric is an instance of scirpy.ir_dist.metrics.DistanceCalculator, {metric} defaults to custom. {name} is taken from reference.uns["DB"]["name"]. If reference does not have a "DB" entry, key_added needs to be specified manually.

inplace : bool (default: True)

If true, store the result in adata.uns. Otherwise return a dictionary with the results.

n_jobs : int | NoneOptional[int] (default: None)

Number of cores to use for distance calculation. Passed on to scirpy.ir_dist.metrics.DistanceCalculator.

Return type

dict | NoneOptional[dict]

Returns

Depending on the value of inplace either returns nothing or a dictionary with sparse, pairwise distance matrices for all VJ and VDJ sequences.