scirpy.pp.ir_dist
- scirpy.pp.ir_dist(adata, reference=None, *, metric='identity', cutoff=None, sequence='nt', key_added=None, inplace=True, n_jobs=None)
Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.
This is a required proprocessing step for clonotype definition and clonotype networks and for querying reference databases.
Calculates the full pairwise distance matrix.
Important
Distances are offset by 1 to allow efficient use of sparse matrices (\(d' = d+1\)).
That means, a
distance > cutoff
is represented as0
, adistance == 0
is represented as1
, adistance == 1
is represented as2
and so on.Only returns distances
<= cutoff
. Larger distances are eliminated from the sparse matrix.Distances are non-negative.
- Parameters
- adata :
AnnData
annotated data matrix
- reference :
AnnData
|None
Optional
[AnnData
] (default:None
) Another
AnnData
object, can be either a second dataset with IR information or a epitope database. If specified, will compute distances between the sequences inadata
and the sequences inreference
. Otherwise computes pairwise distances of the sequences inadata
.- metric : {‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’} |
DistanceCalculator
Union
[Literal
[‘alignment’, ‘identity’, ‘levenshtein’, ‘hamming’],DistanceCalculator
] (default:'identity'
) - You can choose one of the following metrics:
identity
– 1 for identical sequences, 0 otherwise. SeeIdentityDistanceCalculator
. This metric implies a cutoff of 0.levenshtein
– Levenshtein edit distance. SeeLevenshteinDistanceCalculator
.hamming
– Hamming distance for CDR3 sequences of equal length. SeeHammingDistanceCalculator
.alignment
– Distance based on pairwise sequence alignments using the BLOSUM62 matrix. This option is incompatible with nucleotide sequences. SeeAlignmentDistanceCalculator
.any instance of
DistanceCalculator
.
- cutoff :
int
|None
Optional
[int
] (default:None
) All distances
> cutoff
will be replaced by0
and eliminated from the sparse matrix. A sensible cutoff depends on the distance metric, you can find information in the corresponding docs. If set toNone
, the cutoff will be10
for thealignment
metric, and2
forlevenshtein
andhamming
. For the identity metric, the cutoff is ignored and always set to0
.- sequence : {‘aa’, ‘nt’}
Literal
[‘aa’, ‘nt’] (default:'nt'
) Compute distances based on amino acid (
aa
) or nucleotide (nt
) sequences.- key_added :
str
|None
Optional
[str
] (default:None
) Dictionary key under which the results will be stored in
adata.uns
ifinplace=True
. Defaults toir_dist_{sequence}_{metric}
orir_dist_{name}_{sequence}_{metric}
ifreference
is specified. Ifmetric
is an instance ofscirpy.ir_dist.metrics.DistanceCalculator
,{metric}
defaults tocustom
.{name}
is taken fromreference.uns["DB"]["name"]
. Ifreference
does not have a"DB"
entry,key_added
needs to be specified manually.- inplace :
bool
(default:True
) If true, store the result in
adata.uns
. Otherwise return a dictionary with the results.- n_jobs :
int
|None
Optional
[int
] (default:None
) Number of cores to use for distance calculation. Passed on to
scirpy.ir_dist.metrics.DistanceCalculator
.
- adata :
- Return type
- Returns
Depending on the value of
inplace
either returns nothing or a dictionary with sparse, pairwise distance matrices for allVJ
andVDJ
sequences.