snapatac2.pp.scrublet#

snapatac2.pp.scrublet(adata, features='selected', n_comps=15, sim_doublet_ratio=2.0, expected_doublet_rate=0.1, n_neighbors=None, use_approx_neighbors=True, random_state=0, inplace=True, n_jobs=8, verbose=True)[source]#

Compute probability of being a doublet using the scrublet algorithm.

Parameters:

adata (AnnData | list[AnnData]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. adata can also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.
features (UnionType[str, ndarray, None]) – Boolean index mask, where True means that the feature is kept, and False means the feature is removed.
n_comps (int) – Number of PCs
sim_doublet_ratio (float) – Number of doublets to simulate relative to the number of observed cells.
expected_doublet_rate (float) – Expected doublet rate.
n_neighbors (Optional[int]) – Number of neighbors used to construct the KNN graph of observed cells and simulated doublets. If None, this is set to round(0.5 * sqrt(n_cells))
use_approx_neighbors – Whether to use approximate search.
random_state (int) – Random state.
inplace (bool) – Whether update the AnnData object inplace
n_jobs (int) – Number of jobs to run in parallel.
verbose (bool) – Whether to print progress messages.

Returns:

if inplace = True, it updates adata with the following fields:

adata.obs["doublet_probability"]: probability of being a doublet
adata.obs["doublet_score"]: doublet score

Return type:

tuple[np.ndarray, np.ndarray] | None