scirpy.ir_dist.metrics.HammingDistanceCalculator
- class scirpy.ir_dist.metrics.HammingDistanceCalculator(cutoff=None, **kwargs)
Calculates the Hamming distance between sequences of identical length.
The edit distance is the total number of substitution events. Sequences with different lengths will be treated as though they exceeded the distance-cutoff, i.e. they receive a distance of
0
in the sparse distance matrix and will not be connected by an edge in the graph.This class relies on Python-levenshtein to calculate the distances.
- Choosing a cutoff:
Each modification stands for a substitution event. While lacking empirical data, it seems unlikely that CDR3 sequences with more than two modifications still recognize the same antigen.
- Parameters
- cutoff :
int
|None
Optional
[int
] (default:None
) Will eleminate distances > cutoff to make efficient use of sparse matrices. The default cutoff is
2
.- n_jobs
Number of jobs to use for the pairwise distance calculation. If None, use all jobs (only for ParallelDistanceCalculators).
- block_size
The width of a block of the matrix that will be delegated to a worker process. The block contains
block_size ** 2
elements.
- cutoff :
Attributes
The sparse matrix dtype.
Methods
calc_dist_mat
(seqs[, seqs2])Calculate the distance matrix.
squarify
(triangular_matrix)Mirror a triangular matrix at the diagonal to make it a square matrix.