scirpy.pp.ir_dist

scirpy.pp.ir_dist(adata, *, metric='identity', cutoff=None, sequence='nt', key_added=None, inplace=True, n_jobs=None)

Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.

This is a required proprocessing step for clonotype definition and clonotype networks.

Calculates the full pairwise distance matrix.

Important

  • Distances are offset by 1 to allow efficient use of sparse matrices (\(d' = d+1\)).

  • That means, a distance > cutoff is represented as 0, a distance == 0 is represented as 1, a distance == 1 is represented as 2 and so on.

  • Only returns distances <= cutoff. Larger distances are eliminated from the sparse matrix.

  • Distances are non-negative.

Parameters
adata : AnnDataAnnData

annotated data matrix

metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} | DistanceCalculatorUnion[Literal[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’], DistanceCalculator] (default: 'identity')

You can choose one of the following metrics:

cutoff : int | NoneOptional[int] (default: None)

All distances > cutoff will be replaced by 0 and eliminated from the sparse matrix. A sensible cutoff depends on the distance metric, you can find information in the corresponding docs. If set to None, the cutoff will be 10 for the alignment metric, and 2 for levenshtein and hamming. For the identity metric, the cutoff is ignored and always set to 0.

sequence : {‘aa’, ‘nt’}Literal[‘aa’, ‘nt’] (default: 'nt')

Compute distances based on amino acid (aa) or nucleotide (nt) sequences.

key_added : str | NoneOptional[str] (default: None)

Dictionary key under which the results will be stored in adata.uns if inplace=True. Defaults to ir_dist_{sequence}_{metric}. If metric is an instance of scirpy.ir_dist.metrics.DistanceCalculator, {metric} defaults to custom.

inplace : boolbool (default: True)

If true, store the result in adata.uns. Otherwise return a dictionary with the results.

n_jobs : int | NoneOptional[int] (default: None)

Number of cores to use for distance calculation. Passed on to scirpy.ir_dist.metrics.DistanceCalculator.

Return type

dict | NoneOptional[dict]

Returns

Depending on the value of inplace either returns nothing or a dictionary with symmetrical, sparse, pairwise distance matrices for all VJ and VDJ sequences.