snapatac2.tl.dbscan#

snapatac2.tl.dbscan(adata, eps=0.5, min_samples=5, leaf_size=30, n_jobs=None, use_rep='X_spectral', key_added='dbscan')[source]#

Cluster cells with DBSCAN.

Use this function to identify density-connected groups and noise cells from an embedding stored in adata.obsm[use_rep].

Anti-Patterns#

  • Do NOT expect DBSCAN to assign every cell to a cluster; noise cells are labeled as -1.

  • Do NOT reuse eps across embeddings with different scales; tune it for the representation passed through use_rep.

param adata:

Annotated data object containing adata.obsm[use_rep].

type adata:

AnnData

param eps:

The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.

type eps:

float

param min_samples:

The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.

type min_samples:

int

param leaf_size:

Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

type leaf_size:

int

param n_jobs:

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

type n_jobs:

int | None

param use_rep:

Key in adata.obsm containing the input embedding.

type use_rep:

str

param key_added:

Key in adata.obs used to store cluster labels.

type key_added:

str

returns:

Stores categorical labels in adata.obs[key_added].

rtype:

None

Examples

>>> import snapatac2 as snap
>>> adata = snap.datasets.pbmc5k(type="annotated_h5ad")
>>> snap.tl.dbscan(adata, eps=0.5, min_samples=5)
>>> "dbscan" in adata.obs
True