snapatac2.tl.merge_peaks#

snapatac2.tl.merge_peaks(peaks, chrom_sizes, half_width=250)[source]#

Merge group-specific peak calls into a non-overlapping peak set.

Use this function after macs3 to create a shared peak universe for downstream counting and analysis.

This function initially expands the summits of identified peaks by half_width on both sides. Following this expansion, it addresses the issue of overlapping peaks through an iterative process. The procedure begins by prioritizing the most significant peak, determined by the smallest p-value. This peak is retained, and any peak that overlaps with it is excluded. Subsequently, the same method is applied to the next most significant peak. This iteration continues until all peaks have been evaluated, resulting in a final list of non-overlapping peaks, each with a fixed width determined by the initial extension.

Anti-Patterns#

  • Do NOT pass chromosome sizes from a different genome build than the peak coordinates.

  • Do NOT pass arbitrary BED-like tables unless they contain the columns produced by macs3.

param peaks:

Peak tables keyed by group name.

type peaks:

dict[str, ‘polars.DataFrame’]

param chrom_sizes:

Chromosome sizes, or a Genome object from which chromosome sizes are read.

type chrom_sizes:

dict[str, int] | Genome

param half_width:

Number of bases added on each side of each summit before overlap resolution.

type half_width:

int

returns:

Merged, non-overlapping peak table.

rtype:

polars.DataFrame

See also

macs3

Examples

>>> import snapatac2 as snap
>>> adata = snap.datasets.pbmc5k(type="annotated_h5ad")
>>> peaks = snap.tl.macs3(adata, groupby="cell_type", inplace=False, n_jobs=1)
>>> merged = snap.tl.merge_peaks(peaks, snap.genome.hg38)
>>> merged.height > 0
True