snapatac2.pp.filter_doublets#

snapatac2.pp.filter_doublets(adata, probability_threshold=0.5, score_threshold=None, inplace=True, n_jobs=8, verbose=True)[source]#

Remove cells classified as doublets.

Use this function after scrublet to subset an AnnData object or return a boolean mask of cells to keep. Filter by either calibrated doublet probability or raw doublet score.

Anti-Patterns#

  • Do NOT call this function before snap.pp.scrublet; it requires doublet_probability or doublet_score in .obs.

  • Do NOT set both probability_threshold and score_threshold; choose one filtering criterion.

type adata:

AnnData | list[AnnData]

param adata:

AnnData-like object with Scrublet scores in .obs, or a list of such objects. When a list is provided, the function processes objects in parallel.

type probability_threshold:

float | None

param probability_threshold:

Remove cells with doublet_probability greater than this value. Lower values remove more cells. Set to None when using score_threshold.

type score_threshold:

float | None

param score_threshold:

Remove cells with doublet_score greater than this value. Set to None when using probability_threshold.

type inplace:

bool

param inplace:

If True, subset adata in place. If False, return a keep mask.

type n_jobs:

int

param n_jobs:

Number of jobs to run in parallel when adata is a list.

type verbose:

bool

param verbose:

Whether to print progress messages.

returns:

If inplace=False, returns a boolean mask where True keeps a cell and False removes a doublet. If inplace=True, returns None and subsets the object in place.

rtype:

ndarray | None

See also

scrublet

Examples

>>> import snapatac2 as snap
>>> adata = snap.read(snap.datasets.pbmc5k(type="h5ad"), backed=None)
>>> snap.pp.select_features(adata)
>>> snap.pp.scrublet(adata, n_comps=5, sim_doublet_ratio=0.5, verbose=False)
>>> keep = snap.pp.filter_doublets(adata, probability_threshold=0.5, inplace=False, verbose=False)
>>> keep.dtype == bool
True