snapatac2.tl.spectral#
- snapatac2.tl.spectral(adata, n_comps=30, features='selected', random_state=0, sample_size=None, sample_method='random', chunk_size=5000, distance_metric='cosine', weighted_by_sd=True, feature_weights=None, inplace=True, num_threads=32)[source]#
Compute a spectral embedding with Laplacian Eigenmaps.
Use this function to convert a cell-by-feature matrix into a lower-dimensional representation before neighbor graph construction, clustering, and visualization. With
distance_metric="cosine", the matrix-free implementation scales linearly with the number of cells. Other distance metrics materialize pairwise similarity matrices and scale quadratically with cell count.Anti-Patterns#
Do NOT use
distance_metric="jaccard"on very large datasets withoutsample_size; the full pairwise computation can require quadratic memory.Do NOT assume exactly
n_compscomponents are returned whenweighted_by_sd=True; small negative eigenvalues may be removed.Do NOT leave
features="selected"unlessadata.var["selected"]exists.
Note
Determining the appropriate number of components is crucial when performing downstream analyses to ensure optimal clustering outcomes. Utilizing components that are either uninformative or irrelevant can compromise the quality of the results. By default, this function adopts a strategy where all eigenvectors are weighted according to the square root of their corresponding eigenvalues, rather than implementing a strict cutoff threshold. This method generally provides satisfactory results, circumventing the necessity for manual specification of component numbers. However, it’s important to note that there might be exceptional cases with certain datasets where deviating from this default setting could yield better outcomes. In such scenarios, you can disable the automatic weighting by setting
weighted_by_sd=False. Subsequently, you will need to manually determine and select the number of components to use for your specific analysis.This funciton may not always return the exact number of eigenvectors requested. This function computes lower-dimensional embeddings by performing the eigen-decomposition of the normalized graph Laplacian matrix, where all eigenvalues should be non-negative. However, the method used to calculate eigenvectors, specifically
scipy.sparse.linalg.eigsh, may not perform optimally for small eigenvalues. This occasionally leads to the function outputting negative eigenvalues at the lower spectrum. To address this issue, a post-processing step is introduced to eliminate these erroneous eigenvalues whenweighted_by_sd=True(which is the default setting). This step typically has minimal impact, as the affected eigenvalues are generally very small.
- param adata:
Annotated data object with a cell-by-feature count matrix in
.X.- type adata:
AnnData|AnnDataSet- param n_comps:
Maximum number of spectral dimensions to compute.
- type n_comps:
- param features:
Feature selector. If a string, use
adata.var[features]as a Boolean mask. If an array, use it directly. If None, use all features.- type features:
- param random_state:
Seed for random sampling and eigensolver initialization.
- type random_state:
- param sample_size:
Number or fraction of cells to sample for the Nystrom approximation. If None, use all cells.
- type sample_size:
- param sample_method:
Sampling method for the matrix-free Nystrom approximation.
- type sample_method:
Literal['random','degree']- param chunk_size:
Number of cells per chunk in Nystrom extension. The effective work batch is approximately
chunk_size * num_threads.- type chunk_size:
- param distance_metric:
Similarity metric used to construct the cell graph.
- type distance_metric:
Literal['jaccard','cosine']- param weighted_by_sd:
If True, multiply eigenvectors by the square root of their eigenvalues.
- type weighted_by_sd:
- param feature_weights:
Per-feature weights for similarity computation. If None, use inverse document frequency weights where required.
- type feature_weights:
- param inplace:
If True, write results to
adata; if False, return them.- type inplace:
- param num_threads:
Number of threads used by the Nystrom implementation.
- type num_threads:
- returns:
If
inplace=True, stores eigenvectors inadata.obsm["X_spectral"]and eigenvalues inadata.uns["spectral_eigenvalue"], then returns None. Ifinplace=False, returns(eigenvalues, eigenvectors).- rtype:
See also
Examples
>>> import snapatac2 as snap >>> adata = snap.datasets.pbmc5k(type="annotated_h5ad") >>> snap.tl.spectral(adata, features=None, n_comps=30) >>> adata.obsm["X_spectral"].shape[0] == adata.n_obs True