snapatac2.pp.mnc_correct#

snapatac2.pp.mnc_correct(adata, *, batch, n_neighbors=5, n_clusters=40, n_iter=1, use_rep='X_spectral', use_dims=None, groupby=None, key_added=None, inplace=True, n_jobs=8)[source]#

Correct batch effects with centroid-based mutual nearest neighbors.

Use this function after dimensionality reduction and before neighbor-graph construction to align cells across batches. The method clusters each batch, identifies mutual nearest cluster centroids, and projects cells along the resulting correction vectors.

Anti-Patterns#

  • Do NOT run this function on raw count matrices unless distances between raw counts are the intended analysis; use a reduced representation such as X_spectral.

  • Do NOT pass batch as a column name when adata is a NumPy array; provide one label per observation instead.

type adata:

param adata:

AnnData-like object with use_rep in .obsm, AnnDataSet-like object, or a NumPy array of shape n_obs x n_components.

type batch:

param batch:

Column name in .obs that identifies batches, or a list of labels with one entry per observation.

type n_neighbors:

param n_neighbors:

Number of nearest centroids to inspect when finding mutual nearest neighbors.

type n_clusters:

param n_clusters:

Maximum number of clusters to form in each batch.

type n_iter:

param n_iter:

Number of correction iterations.

type use_rep:

param use_rep:

Key in .obsm containing the input embedding.

type use_dims:

param use_dims:

Dimensions of use_rep or the input array to use. If an integer, use the first use_dims columns. If a list, use those column indices.

type groupby:

param groupby:

Column name or labels used to split cells and run correction independently within each group.

type key_added:

param key_added:

Key used to store the corrected embedding. If None, store it in .obsm[use_rep + "_mnn"].

type inplace:

param inplace:

If True and adata is AnnData-like, store the corrected embedding in .obsm. Ignored for NumPy input.

type n_jobs:

param n_jobs:

Number of worker processes used when groupby is specified.

returns:

Corrected embedding of shape n_obs x n_selected_components when inplace=False or when adata is a NumPy array. Returns None when inplace=True and stores the result in .obsm.

rtype:

np.ndarray | None

Examples

>>> import numpy as np
>>> import snapatac2 as snap
>>> X = np.array([[0.0, 0.1], [0.2, 0.0], [3.0, 3.1], [3.2, 3.0]])
>>> batch = ["a", "a", "b", "b"]
>>> corrected = snap.pp.mnc_correct(X, batch=batch, n_clusters=2, inplace=False)
>>> corrected.shape
(4, 2)