snapatac2.tl.spectral#

snapatac2.tl.spectral(adata, n_comps=30, features='selected', random_state=0, sample_size=None, sample_method='random', chunk_size=5000, distance_metric='cosine', weighted_by_sd=True, feature_weights=None, inplace=True, num_threads=32)[source]#

Compute a spectral embedding with Laplacian Eigenmaps.

Use this function to convert a cell-by-feature matrix into a lower-dimensional representation before neighbor graph construction, clustering, and visualization. With distance_metric="cosine", the matrix-free implementation scales linearly with the number of cells. Other distance metrics materialize pairwise similarity matrices and scale quadratically with cell count.

Anti-Patterns#

  • Do NOT use distance_metric="jaccard" on very large datasets without sample_size; the full pairwise computation can require quadratic memory.

  • Do NOT assume exactly n_comps components are returned when weighted_by_sd=True; small negative eigenvalues may be removed.

  • Do NOT leave features="selected" unless adata.var["selected"] exists.

Note

  • Determining the appropriate number of components is crucial when performing downstream analyses to ensure optimal clustering outcomes. Utilizing components that are either uninformative or irrelevant can compromise the quality of the results. By default, this function adopts a strategy where all eigenvectors are weighted according to the square root of their corresponding eigenvalues, rather than implementing a strict cutoff threshold. This method generally provides satisfactory results, circumventing the necessity for manual specification of component numbers. However, it’s important to note that there might be exceptional cases with certain datasets where deviating from this default setting could yield better outcomes. In such scenarios, you can disable the automatic weighting by setting weighted_by_sd=False. Subsequently, you will need to manually determine and select the number of components to use for your specific analysis.

  • This funciton may not always return the exact number of eigenvectors requested. This function computes lower-dimensional embeddings by performing the eigen-decomposition of the normalized graph Laplacian matrix, where all eigenvalues should be non-negative. However, the method used to calculate eigenvectors, specifically scipy.sparse.linalg.eigsh, may not perform optimally for small eigenvalues. This occasionally leads to the function outputting negative eigenvalues at the lower spectrum. To address this issue, a post-processing step is introduced to eliminate these erroneous eigenvalues when weighted_by_sd=True (which is the default setting). This step typically has minimal impact, as the affected eigenvalues are generally very small.

param adata:

Annotated data object with a cell-by-feature count matrix in .X.

type adata:

AnnData | AnnDataSet

param n_comps:

Maximum number of spectral dimensions to compute.

type n_comps:

int

param features:

Feature selector. If a string, use adata.var[features] as a Boolean mask. If an array, use it directly. If None, use all features.

type features:

str | ndarray | None

param random_state:

Seed for random sampling and eigensolver initialization.

type random_state:

int

param sample_size:

Number or fraction of cells to sample for the Nystrom approximation. If None, use all cells.

type sample_size:

int | float | None

param sample_method:

Sampling method for the matrix-free Nystrom approximation.

type sample_method:

Literal['random', 'degree']

param chunk_size:

Number of cells per chunk in Nystrom extension. The effective work batch is approximately chunk_size * num_threads.

type chunk_size:

int

param distance_metric:

Similarity metric used to construct the cell graph.

type distance_metric:

Literal['jaccard', 'cosine']

param weighted_by_sd:

If True, multiply eigenvectors by the square root of their eigenvalues.

type weighted_by_sd:

bool

param feature_weights:

Per-feature weights for similarity computation. If None, use inverse document frequency weights where required.

type feature_weights:

list[float] | None

param inplace:

If True, write results to adata; if False, return them.

type inplace:

bool

param num_threads:

Number of threads used by the Nystrom implementation.

type num_threads:

int

returns:

If inplace=True, stores eigenvectors in adata.obsm["X_spectral"] and eigenvalues in adata.uns["spectral_eigenvalue"], then returns None. If inplace=False, returns (eigenvalues, eigenvectors).

rtype:

tuple[ndarray, ndarray] | None

See also

multi_spectral

Examples

>>> import snapatac2 as snap
>>> adata = snap.datasets.pbmc5k(type="annotated_h5ad")
>>> snap.tl.spectral(adata, features=None, n_comps=30)
>>> adata.obsm["X_spectral"].shape[0] == adata.n_obs
True