snapatac2.pp.make_peak_matrix#
- snapatac2.pp.make_peak_matrix(adata, *, use_rep=None, inplace=False, file=None, backend='hdf5', peak_file=None, chunk_size=500, use_x=False, min_frag_size=None, max_frag_size=None, counting_strategy='paired-insertion', value_type='target', summary_type='sum')[source]#
Generate a cell-by-peak count matrix.
Use this function after
import_fragments,import_values, or peak calling to aggregate fragments or values over peak intervals. Provide peak intervals with exactly one ofpeak_fileoruse_rep; if both are omitted, the function reads peaks fromadata.uns["peaks"].Anti-Patterns#
Do NOT pass both
peak_fileanduse_rep; the function raises an error because the peak source would be ambiguous.Do NOT call this function before importing fragments or values. The input must contain SnapATAC2’s internal fragment or value storage.
Do NOT expect
fileorbackendto affect output storage wheninplace=True; these arguments are only used wheninplace=False.Do NOT set
use_x=Trueunless.Xalready contains the feature-by-cell counts that should be reused as raw counts.
- type adata:
AnnData|AnnDataSet- param adata:
The imported AnnData object, or AnnDataSet, containing per-cell fragment or value storage.
- type use_rep:
- param use_rep:
Peak source stored in
adata.uns[use_rep], or a list of peak strings such as["chr1:1-100", "chr2:2-200"]. IfNoneandpeak_fileis alsoNone,"peaks"is used.- type inplace:
- param inplace:
If
True, store the peak matrix inadata.Xand returnNone. IfFalse, return a new AnnData object containing the peak matrix.- type file:
- param file:
Output file for the returned AnnData object when
inplace=False. If provided, the result is stored as backed AnnData. IfNone, the result is returned in memory. This argument has no effect wheninplace=True.- type backend:
Literal['hdf5']- param backend:
Backend used for backed output when
fileis provided.- type peak_file:
- param peak_file:
BED file containing peak intervals. Plain text and
.gzfiles are supported. Do not set this together withuse_rep.- type chunk_size:
- param chunk_size:
Number of peaks processed per chunk. Increase this value to improve I/O throughput when memory is sufficient; decrease it to reduce peak memory use.
- type use_x:
- param use_x:
If
True, use the matrix stored in.Xas raw counts. IfFalse, use the imported fragment or insertion storage.- type min_frag_size:
- param min_frag_size:
Minimum fragment size to include. Fragments shorter than this threshold are ignored. Use
Noneto disable the lower bound.- type max_frag_size:
- param max_frag_size:
Maximum fragment size to include. Fragments longer than this threshold are ignored. Use
Noneto disable the upper bound.- type counting_strategy:
Literal['fragment','insertion','paired-insertion']- param counting_strategy:
The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.
- type value_type:
Literal['target','total','fraction']- param value_type:
The value to summarize from
.obsm['_values']when data was imported withimport_values. It must be one of “target”, “total”, or “fraction”. “target” means the number of records with positive measurements, e.g. methylated bases. “total” means the total number of measurements, e.g. methylated plus unmethylated bases. “fraction” means the fraction of records with positive measurements.- type summary_type:
Literal['sum','mean']- param summary_type:
The aggregation to use when multiple values are found in a peak. This parameter is only used when
.obsm['_values']exists, which is created byimport_values. It must be “sum” or “mean”.- returns:
If
inplace=False, returns an annotated data matrix whose rows are cells and columns are peaks. Iffile=None, returns an in-memory AnnData object; otherwise returns a backed AnnData object. Ifinplace=True, returnsNoneand updatesadata.Xin place.- rtype:
AnnData
See also
Examples
>>> import snapatac2 as snap >>> fragments = snap.datasets.pbmc500(downsample=True) >>> data = snap.pp.import_fragments( ... fragments, ... chrom_sizes=snap.genome.hg38, ... sorted_by_barcode=False, ... ) >>> peak_mat = snap.pp.make_peak_matrix( ... data, ... peak_file=snap.datasets.cre_HEA(), ... ) >>> print(peak_mat.shape)