snapatac2.pp.filter_cells#

snapatac2.pp.filter_cells(data, min_counts=1000, min_tsse=5.0, max_counts=None, max_tsse=None, inplace=True, n_jobs=8)[source]#

Filter cells by fragment-count and TSS-enrichment QC thresholds.

Use this function after computing per-cell QC metrics to remove unreliable observations. By default, cells must have at least 1000 fragments and a TSS enrichment score of at least 5.0.

Anti-Patterns#

  • Do NOT call this function before data.obs["n_fragment"] and, when TSS filtering is enabled, data.obs["tsse"] are available.

  • Do NOT leave min_tsse enabled when TSS enrichment was not computed; pass min_tsse=None to filter only by fragment counts.

  • Do NOT expect a return value when inplace=True; the object is subset in place and the function returns None.

type data:

AnnData | list[AnnData]

param data:

AnnData object, or list of AnnData objects, to filter.

type min_counts:

int | None

param min_counts:

Minimum data.obs["n_fragment"] value required for a cell to pass filtering. Use None to disable the lower fragment-count bound.

type min_tsse:

float | None

param min_tsse:

Minimum data.obs["tsse"] value required for a cell to pass filtering. Use None to disable the lower TSS-enrichment bound.

type max_counts:

int | None

param max_counts:

Maximum data.obs["n_fragment"] value allowed for a cell to pass filtering. Use None to disable the upper fragment-count bound.

type max_tsse:

float | None

param max_tsse:

Maximum data.obs["tsse"] value allowed for a cell to pass filtering. Use None to disable the upper TSS-enrichment bound.

type inplace:

bool

param inplace:

If True, subset data in place and return None. If False, return integer indices of cells passing all enabled thresholds.

type n_jobs:

int

param n_jobs:

Number of parallel jobs to use when data is a list.

returns:

If inplace=False, returns integer indices of cells that pass all enabled thresholds. If data is a list, returns one index array per object. If inplace=True, returns None and subsets data in place.

rtype:

ndarray | None

See also

call_cells

Call cell-containing barcodes from count distributions.

Examples

>>> import snapatac2 as snap
>>> fragments = snap.datasets.pbmc500(downsample=True)
>>> data = snap.pp.import_fragments(
...     fragments,
...     chrom_sizes=snap.genome.hg38,
...     sorted_by_barcode=False,
... )
>>> snap.metrics.tsse(data, snap.genome.hg38)
>>> selected = snap.pp.filter_cells(
...     data,
...     min_counts=1000,
...     min_tsse=5.0,
...     inplace=False,
... )
>>> data = data[selected, :]