snapatac2.tl.hdbscan#

snapatac2.tl.hdbscan(adata, min_cluster_size=5, min_samples=None, cluster_selection_epsilon=0.0, alpha=1.0, cluster_selection_method='eom', random_state=0, use_rep='X_spectral', key_added='hdbscan', **kwargs)[source]#

Cluster cells with HDBSCAN.

Use this function to detect variable-density clusters and label noise cells from an embedding stored in adata.obsm[use_rep].

Anti-Patterns#

  • Do NOT expect every cell to receive a cluster label; HDBSCAN labels noise cells as -1.

  • Do NOT pass raw fragment counts as use_rep; use a low-dimensional embedding for typical workflows.

param adata:

Annotated data object containing adata.obsm[use_rep].

type adata:

AnnData

param min_cluster_size:

Minimum cluster size; single linkage splits that contain fewer points than this will be considered points “falling out” of a cluster rather than a cluster splitting into two new clusters.

type min_cluster_size:

int

param min_samples:

Number of samples in a neighborhood for a point to be considered a core point. If None, HDBSCAN chooses its default from min_cluster_size.

type min_samples:

int | None

param cluster_selection_epsilon:

A distance threshold. Clusters below this value will be merged.

type cluster_selection_epsilon:

float

param alpha:

A distance scaling parameter as used in robust single linkage.

type alpha:

float

param cluster_selection_method:

Cluster extraction method, usually "eom" or "leaf".

type cluster_selection_method:

str

param random_state:

API parameter reserved for consistency with other clustering functions.

type random_state:

int

param use_rep:

Key in adata.obsm containing the input embedding.

type use_rep:

str

param key_added:

Key in adata.obs used to store cluster labels.

type key_added:

str

type **kwargs:

param **kwargs:

Additional keyword arguments passed to hdbscan.HDBSCAN.

returns:

Stores categorical labels in adata.obs[key_added].

rtype:

None

Examples

>>> import snapatac2 as snap
>>> adata = snap.datasets.pbmc5k(type="annotated_h5ad")
>>> snap.tl.hdbscan(adata, min_cluster_size=20)
>>> "hdbscan" in adata.obs
True