snapatac2.tl.dbscan#
- snapatac2.tl.dbscan(adata, eps=0.5, min_samples=5, leaf_size=30, n_jobs=None, use_rep='X_spectral', key_added='dbscan')[source]#
Cluster cells with DBSCAN.
Use this function to identify density-connected groups and noise cells from an embedding stored in
adata.obsm[use_rep].Anti-Patterns#
Do NOT expect DBSCAN to assign every cell to a cluster; noise cells are labeled as
-1.Do NOT reuse
epsacross embeddings with different scales; tune it for the representation passed throughuse_rep.
- param adata:
Annotated data object containing
adata.obsm[use_rep].- type adata:
AnnData- param eps:
The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.
- type eps:
- param min_samples:
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.
- type min_samples:
- param leaf_size:
Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
- type leaf_size:
- param n_jobs:
The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
- type n_jobs:
- param use_rep:
Key in
adata.obsmcontaining the input embedding.- type use_rep:
- param key_added:
Key in
adata.obsused to store cluster labels.- type key_added:
- returns:
Stores categorical labels in
adata.obs[key_added].- rtype:
Examples
>>> import snapatac2 as snap >>> adata = snap.datasets.pbmc5k(type="annotated_h5ad") >>> snap.tl.dbscan(adata, eps=0.5, min_samples=5) >>> "dbscan" in adata.obs True