snapatac2.pp.scrublet#

snapatac2.pp.scrublet(adata, features='selected', n_comps=15, sim_doublet_ratio=2.0, expected_doublet_rate=0.1, n_neighbors=None, use_approx_neighbors=False, random_state=0, inplace=True, n_jobs=8, verbose=True)[source]#

Score ATAC-seq cells for doublet likelihood with Scrublet.

Use this function after constructing a cell-by-feature count matrix to add doublet probabilities and doublet scores to cells. The algorithm simulates doublets by summing randomly paired observed cells, embeds observed and simulated profiles together with spectral embedding, computes neighbor-based doublet scores, and converts scores to probabilities with a Gaussian mixture model.

Anti-Patterns#

  • Do NOT run this function before selecting features when features="selected"; call snap.pp.select_features first or pass features=None.

  • Do NOT interpret doublet_score as a calibrated probability; use doublet_probability for probability-threshold filtering.

type adata:

AnnData | list[AnnData]

param adata:

AnnData-like object with a count matrix in .X, or a list of AnnData-like objects. When a list is provided, the function processes objects in parallel.

type features:

str | ndarray | None

param features:

Features used for scoring. If a string, read a boolean mask from adata.var[features]. If an array, True keeps a feature and False removes it. If None, use all features.

type n_comps:

int

param n_comps:

Number of spectral components used to embed observed and simulated cells.

type sim_doublet_ratio:

float

param sim_doublet_ratio:

Number of simulated doublets relative to the number of observed cells.

type expected_doublet_rate:

float

param expected_doublet_rate:

Prior expected doublet rate used in score calculation.

type n_neighbors:

int | None

param n_neighbors:

Number of neighbors used to construct the KNN graph of observed cells and simulated doublets. If None, use round(0.5 * sqrt(n_cells)).

type use_approx_neighbors:

param use_approx_neighbors:

Whether to use approximate nearest-neighbor search.

type random_state:

int

param random_state:

Random seed for doublet simulation and probability modeling.

type inplace:

bool

param inplace:

If True, store scores in adata. If False, return arrays.

type n_jobs:

int

param n_jobs:

Number of jobs to run in parallel when adata is a list.

type verbose:

bool

param verbose:

Whether to print progress messages.

returns:

If inplace=False, returns (doublet_probability, doublet_score) for a single object or a list of such tuples for a list of objects. If inplace=True, returns None and writes doublet_probability and doublet_score to .obs, plus simulated scores to .uns["scrublet_sim_doublet_score"].

rtype:

None

Examples

>>> import snapatac2 as snap
>>> adata = snap.read(snap.datasets.pbmc5k(type="h5ad"), backed=None)
>>> snap.pp.select_features(adata)
>>> snap.pp.scrublet(adata, n_comps=5, sim_doublet_ratio=0.5, verbose=False)
>>> {"doublet_probability", "doublet_score"}.issubset(adata.obs.columns)
True