snapatac2.metrics.frag_size_distr#

snapatac2.metrics.frag_size_distr(adata, *, max_recorded_size=1000, add_key='frag_size_distr', inplace=True, n_jobs=8)[source]#

Compute the dataset-level fragment size distribution.

Run this metric after import_fragments has attached fragment metadata to the AnnData object. The result is a vector where index i counts fragments of length i, except index 0 counts fragments longer than max_recorded_size. This metric summarizes the whole dataset rather than individual cells.

Anti-Patterns#

  • Do NOT interpret the returned vector as cell-level values; it is one distribution per AnnData object.

  • Do NOT call this function on an AnnData object that lacks imported fragments.

param adata:

AnnData object, or a list of AnnData objects, with imported fragments. When a list is provided, compute one distribution for each object in parallel.

type adata:

AnnData | list[AnnData]

param max_recorded_size:

Largest fragment length with its own output bin. Fragments longer than this value are counted at index 0.

type max_recorded_size:

int

param add_key:

Key used to store the distribution in adata.uns when inplace=True.

type add_key:

str

param inplace:

If True, store the distribution in adata.uns[add_key]. If False, return the distribution.

type inplace:

bool

param n_jobs:

Number of jobs to run when adata is a list. If n_jobs=-1, use all available CPUs.

type n_jobs:

int

returns:

If inplace=True, returns None after storing the distribution in adata.uns[add_key]. If inplace=False, returns the distribution, or a list of distributions when adata is a list.

rtype:

ndarray | list[ndarray] | None

Examples

>>> import snapatac2 as snap
>>> data = snap.pp.import_fragments(
...     snap.datasets.pbmc500(downsample=True),
...     chrom_sizes=snap.genome.hg38,
...     sorted_by_barcode=False,
... )
>>> snap.metrics.frag_size_distr(data)
>>> data.uns["frag_size_distr"].shape[0]
1001