snapatac2.metrics.frag_size_distr#

snapatac2.metrics.frag_size_distr(adata, *, max_recorded_size=1000, add_key='frag_size_distr', inplace=True, n_jobs=8)[source]#

Compute the dataset-level fragment size distribution.

Run this metric after import_fragments has attached fragment metadata to the AnnData object. The result is a vector where index i counts fragments of length i, except index 0 counts fragments longer than max_recorded_size. This metric summarizes the whole dataset rather than individual cells.

Anti-Patterns#

Do NOT interpret the returned vector as cell-level values; it is one distribution per AnnData object.
Do NOT call this function on an AnnData object that lacks imported fragments.

param adata:: AnnData object, or a list of AnnData objects, with imported fragments. When a list is provided, compute one distribution for each object in parallel.
type adata:: AnnData | list[AnnData]
param max_recorded_size:: Largest fragment length with its own output bin. Fragments longer than this value are counted at index 0.
type max_recorded_size:: int
param add_key:: Key used to store the distribution in adata.uns when inplace=True.
type add_key:: str
param inplace:: If True, store the distribution in adata.uns[add_key]. If False, return the distribution.
type inplace:: bool
param n_jobs:: Number of jobs to run when adata is a list. If n_jobs=-1, use all available CPUs.
type n_jobs:: int
returns:: If inplace=True, returns None after storing the distribution in adata.uns[add_key]. If inplace=False, returns the distribution, or a list of distributions when adata is a list.
rtype:: ndarray | list[ndarray] | None

Examples

>>> import snapatac2 as snap
>>> data = snap.pp.import_fragments(
...     snap.datasets.pbmc500(downsample=True),
...     chrom_sizes=snap.genome.hg38,
...     sorted_by_barcode=False,
... )
>>> snap.metrics.frag_size_distr(data)
>>> data.uns["frag_size_distr"].shape[0]
1001