snapatac2.pp.make_peak_matrix#

snapatac2.pp.make_peak_matrix(adata, *, use_rep=None, inplace=False, file=None, backend='hdf5', peak_file=None, chunk_size=500, use_x=False, min_frag_size=None, max_frag_size=None, counting_strategy='paired-insertion', value_type='target', summary_type='sum')[source]#

Generate a cell-by-peak count matrix.

Use this function after import_fragments, import_values, or peak calling to aggregate fragments or values over peak intervals. Provide peak intervals with exactly one of peak_file or use_rep; if both are omitted, the function reads peaks from adata.uns["peaks"].

Anti-Patterns#

  • Do NOT pass both peak_file and use_rep; the function raises an error because the peak source would be ambiguous.

  • Do NOT call this function before importing fragments or values. The input must contain SnapATAC2’s internal fragment or value storage.

  • Do NOT expect file or backend to affect output storage when inplace=True; these arguments are only used when inplace=False.

  • Do NOT set use_x=True unless .X already contains the feature-by-cell counts that should be reused as raw counts.

type adata:

AnnData | AnnDataSet

param adata:

The imported AnnData object, or AnnDataSet, containing per-cell fragment or value storage.

type use_rep:

str | list[str] | None

param use_rep:

Peak source stored in adata.uns[use_rep], or a list of peak strings such as ["chr1:1-100", "chr2:2-200"]. If None and peak_file is also None, "peaks" is used.

type inplace:

bool

param inplace:

If True, store the peak matrix in adata.X and return None. If False, return a new AnnData object containing the peak matrix.

type file:

Path | None

param file:

Output file for the returned AnnData object when inplace=False. If provided, the result is stored as backed AnnData. If None, the result is returned in memory. This argument has no effect when inplace=True.

type backend:

Literal['hdf5']

param backend:

Backend used for backed output when file is provided.

type peak_file:

Path | None

param peak_file:

BED file containing peak intervals. Plain text and .gz files are supported. Do not set this together with use_rep.

type chunk_size:

int

param chunk_size:

Number of peaks processed per chunk. Increase this value to improve I/O throughput when memory is sufficient; decrease it to reduce peak memory use.

type use_x:

bool

param use_x:

If True, use the matrix stored in .X as raw counts. If False, use the imported fragment or insertion storage.

type min_frag_size:

int | None

param min_frag_size:

Minimum fragment size to include. Fragments shorter than this threshold are ignored. Use None to disable the lower bound.

type max_frag_size:

int | None

param max_frag_size:

Maximum fragment size to include. Fragments longer than this threshold are ignored. Use None to disable the upper bound.

type counting_strategy:

Literal['fragment', 'insertion', 'paired-insertion']

param counting_strategy:

The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.

type value_type:

Literal['target', 'total', 'fraction']

param value_type:

The value to summarize from .obsm['_values'] when data was imported with import_values. It must be one of “target”, “total”, or “fraction”. “target” means the number of records with positive measurements, e.g. methylated bases. “total” means the total number of measurements, e.g. methylated plus unmethylated bases. “fraction” means the fraction of records with positive measurements.

type summary_type:

Literal['sum', 'mean']

param summary_type:

The aggregation to use when multiple values are found in a peak. This parameter is only used when .obsm['_values'] exists, which is created by import_values. It must be “sum” or “mean”.

returns:

If inplace=False, returns an annotated data matrix whose rows are cells and columns are peaks. If file=None, returns an in-memory AnnData object; otherwise returns a backed AnnData object. If inplace=True, returns None and updates adata.X in place.

rtype:

AnnData

Examples

>>> import snapatac2 as snap
>>> fragments = snap.datasets.pbmc500(downsample=True)
>>> data = snap.pp.import_fragments(
...     fragments,
...     chrom_sizes=snap.genome.hg38,
...     sorted_by_barcode=False,
... )
>>> peak_mat = snap.pp.make_peak_matrix(
...     data,
...     peak_file=snap.datasets.cre_HEA(),
... )
>>> print(peak_mat.shape)