scirpy.pp.ir_dist¶
-
scirpy.pp.
ir_dist
(adata, *, metric='identity', cutoff=None, sequence='nt', key_added=None, inplace=True, n_jobs=None)¶ Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.
This is a required proprocessing step for clonotype definition and clonotype networks.
Calculates the full pairwise distance matrix.
Important
Distances are offset by 1 to allow efficient use of sparse matrices (\(d' = d+1\)).
That means, a
distance > cutoff
is represented as0
, adistance == 0
is represented as1
, adistance == 1
is represented as2
and so on.Only returns distances
<= cutoff
. Larger distances are eliminated from the sparse matrix.Distances are non-negative.
- Parameters
- adata :
AnnData
AnnData
annotated data matrix
- metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} |
DistanceCalculator
Union
[Literal
[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’],DistanceCalculator
] (default:'identity'
) - You can choose one of the following metrics:
identity
– 1 for identical sequences, 0 otherwise. SeeIdentityDistanceCalculator
. This metric implies a cutoff of 0.levenshtein
– Levenshtein edit distance. SeeLevenshteinDistanceCalculator
.hamming
– Hamming distance for CDR3 sequences of equal length. SeeHammingDistanceCalculator
.alignment
– Distance based on pairwise sequence alignments using the BLOSUM62 matrix. This option is incompatible with nucleotide sequences. SeeAlignmentDistanceCalculator
.any instance of
DistanceCalculator
.
- cutoff :
int
|None
Optional
[int
] (default:None
) All distances
> cutoff
will be replaced by0
and eliminated from the sparse matrix. A sensible cutoff depends on the distance metric, you can find information in the corresponding docs. If set toNone
, the cutoff will be10
for thealignment
metric, and2
forlevenshtein
andhamming
. For the identity metric, the cutoff is ignored and always set to0
.- sequence : {‘aa’, ‘nt’}
Literal
[‘aa’, ‘nt’] (default:'nt'
) Compute distances based on amino acid (
aa
) or nucleotide (nt
) sequences.- key_added :
str
|None
Optional
[str
] (default:None
) Dictionary key under which the results will be stored in
adata.uns
ifinplace=True
. Defaults toir_dist_{sequence}_{metric}
. Ifmetric
is an instance ofscirpy.ir_dist.metrics.DistanceCalculator
,{metric}
defaults tocustom
.- inplace :
bool
bool
(default:True
) If true, store the result in
adata.uns
. Otherwise return a dictionary with the results.- n_jobs :
int
|None
Optional
[int
] (default:None
) Number of cores to use for distance calculation. Passed on to
scirpy.ir_dist.metrics.DistanceCalculator
.
- adata :
- Return type
- Returns
Depending on the value of
inplace
either returns nothing or a dictionary with symmetrical, sparse, pairwise distance matrices for allVJ
andVDJ
sequences.