snapatac2.pp.scrublet#
- snapatac2.pp.scrublet(adata, features='selected', n_comps=15, sim_doublet_ratio=2.0, expected_doublet_rate=0.1, n_neighbors=None, use_approx_neighbors=False, random_state=0, inplace=True, n_jobs=8, verbose=True)[source]#
Score ATAC-seq cells for doublet likelihood with Scrublet.
Use this function after constructing a cell-by-feature count matrix to add doublet probabilities and doublet scores to cells. The algorithm simulates doublets by summing randomly paired observed cells, embeds observed and simulated profiles together with spectral embedding, computes neighbor-based doublet scores, and converts scores to probabilities with a Gaussian mixture model.
Anti-Patterns#
Do NOT run this function before selecting features when
features="selected"; callsnap.pp.select_featuresfirst or passfeatures=None.Do NOT interpret
doublet_scoreas a calibrated probability; usedoublet_probabilityfor probability-threshold filtering.
- type adata:
AnnData|list[AnnData]- param adata:
AnnData-like object with a count matrix in
.X, or a list of AnnData-like objects. When a list is provided, the function processes objects in parallel.- type features:
- param features:
Features used for scoring. If a string, read a boolean mask from
adata.var[features]. If an array,Truekeeps a feature andFalseremoves it. IfNone, use all features.- type n_comps:
- param n_comps:
Number of spectral components used to embed observed and simulated cells.
- type sim_doublet_ratio:
- param sim_doublet_ratio:
Number of simulated doublets relative to the number of observed cells.
- type expected_doublet_rate:
- param expected_doublet_rate:
Prior expected doublet rate used in score calculation.
- type n_neighbors:
- param n_neighbors:
Number of neighbors used to construct the KNN graph of observed cells and simulated doublets. If
None, useround(0.5 * sqrt(n_cells)).- type use_approx_neighbors:
- param use_approx_neighbors:
Whether to use approximate nearest-neighbor search.
- type random_state:
- param random_state:
Random seed for doublet simulation and probability modeling.
- type inplace:
- param inplace:
If
True, store scores inadata. IfFalse, return arrays.- type n_jobs:
- param n_jobs:
Number of jobs to run in parallel when
adatais a list.- type verbose:
- param verbose:
Whether to print progress messages.
- returns:
If
inplace=False, returns(doublet_probability, doublet_score)for a single object or a list of such tuples for a list of objects. Ifinplace=True, returnsNoneand writesdoublet_probabilityanddoublet_scoreto.obs, plus simulated scores to.uns["scrublet_sim_doublet_score"].- rtype:
Examples
>>> import snapatac2 as snap >>> adata = snap.read(snap.datasets.pbmc5k(type="h5ad"), backed=None) >>> snap.pp.select_features(adata) >>> snap.pp.scrublet(adata, n_comps=5, sim_doublet_ratio=0.5, verbose=False) >>> {"doublet_probability", "doublet_score"}.issubset(adata.obs.columns) True