scirpy.ir_dist.metrics.AlignmentDistanceCalculator
- class scirpy.ir_dist.metrics.AlignmentDistanceCalculator(cutoff=None, *, n_jobs=None, block_size=50, subst_mat='blosum62', gap_open=11, gap_extend=11)
Calculates distance between sequences based on pairwise sequence alignment.
The distance between two sequences is defined as \(S_{1,2}^{max} - S_{1,2}\), where \(S_{1,2}\) is the alignment score of sequences 1 and 2 and \(S_{1,2}^{max}\) is the max. achievable alignment score of sequences 1 and 2. \(S_{1,2}^{max}\) is defined as \(\min(S_{1,1}, S_{2,2})\).
The use of alignment-based distances is heavily inspired by [DFGH+17].
High-performance sequence alignments are calculated leveraging the parasail library ([Dai16]).
- Choosing a cutoff:
Alignment distances need to be viewed in the light of the substitution matrix. The alignment distance is the difference between the actual alignment score and the max. achievable alignment score. For instance, a mutation from Leucine (
L
) to Isoleucine (I
) results in a BLOSUM62 score of2
. AnL
aligned withL
achieves a score of4
. The distance is, therefore,2
.On the other hand, a single Tryptophane (
W
) mutating into, e.g. Proline (P
) already results in a distance of15
.We are still lacking empirical data up to which distance a CDR3 sequence still is likely to recognize the same antigen, but reasonable cutoffs are
<15
.
- Parameters
- cutoff :
int
|None
Optional
[int
] (default:None
) Will eleminate distances > cutoff to make efficient use of sparse matrices. The default cutoff is
10
.- n_jobs :
int
|None
Optional
[int
] (default:None
) Number of jobs to use for the pairwise distance calculation. If None, use all jobs (only for ParallelDistanceCalculators).
- block_size :
int
(default:50
) The width of a block of the matrix that will be delegated to a worker process. The block contains
block_size ** 2
elements.- subst_mat :
str
(default:'blosum62'
) Name of parasail substitution matrix
- gap_open :
int
(default:11
) Gap open penalty
- gap_extend :
int
(default:11
) Gap extend penatly
- cutoff :
Attributes
The sparse matrix dtype.
Methods
calc_dist_mat
(seqs[, seqs2])Calculate the distance matrix.
squarify
(triangular_matrix)Mirror a triangular matrix at the diagonal to make it a square matrix.