snapatac2.pp.add_tile_matrix#
- snapatac2.pp.add_tile_matrix(adata, *, bin_size=500, inplace=True, chunk_size=500, exclude_chroms=['chrM', 'chrY', 'M', 'Y'], min_frag_size=None, max_frag_size=None, counting_strategy='paired-insertion', value_type='target', summary_type='sum', file=None, backend='hdf5', n_jobs=8)[source]#
Generate a cell-by-genomic-bin count matrix.
Use this function after
import_fragmentsorimport_valuesto summarize per-cell fragments or values into fixed-width genomic bins. Execute this step before feature selection, dimensionality reduction, clustering, or other workflows that require a count matrix in.X.Anti-Patterns#
Do NOT call this function on an AnnData object that was not created by
import_fragmentsorimport_values. The input must contain the internal fragment or value storage used by SnapATAC2.Do NOT use
inplace=Falsewith a list of AnnData objects. Lists are only supported wheninplace=True, where each object is updated in parallel.Do NOT expect
fileorbackendto change output storage wheninplace=True; these arguments are only used wheninplace=False.Do NOT pass
exclude_chroms=Noneunless mitochondrial, sex, and other special chromosomes should be retained in the tile matrix.
- type adata:
AnnData|list[AnnData]- param adata:
The imported AnnData object, or a list of imported AnnData objects when
inplace=True. Each object must contain fragment data fromimport_fragmentsor value data fromimport_values.- type bin_size:
- param bin_size:
The width, in base pairs, of each consecutive genomic bin.
- type inplace:
- param inplace:
If
True, store the tile matrix inadata.Xand returnNone. IfFalse, return a new AnnData object containing the tile matrix.- type chunk_size:
- param chunk_size:
Number of bins processed per chunk. Increase this value to improve I/O throughput when memory is sufficient; decrease it to reduce peak memory use.
- type exclude_chroms:
- param exclude_chroms:
Chromosome names to exclude before binning. By default, mitochondrial and Y chromosomes are excluded (
"chrM","chrY","M","Y").- type min_frag_size:
- param min_frag_size:
Minimum fragment size to include. Fragments shorter than this threshold are ignored. Use
Noneto disable the lower bound.- type max_frag_size:
- param max_frag_size:
Maximum fragment size to include. Fragments longer than this threshold are ignored. Use
Noneto disable the upper bound.- type counting_strategy:
Literal['fragment','insertion','paired-insertion']- param counting_strategy:
The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.
- type value_type:
Literal['target','total','fraction']- param value_type:
The value to summarize from
.obsm['_values']when data was imported withimport_values. It must be one of “target”, “total”, or “fraction”. “target” means the number of records with positive measurements, e.g. methylated bases. “total” means the total number of measurements, e.g. methylated plus unmethylated bases. “fraction” means the fraction of records with positive measurements.- type summary_type:
Literal['sum','mean']- param summary_type:
The aggregation to use when multiple values are found in a bin. This parameter is only used when
.obsm['_values']exists, which is created byimport_values. It must be “sum” or “mean”.- type file:
- param file:
Output file for the returned AnnData object when
inplace=False. If provided, the result is stored as backed AnnData. IfNone, the result is returned in memory. This argument has no effect wheninplace=True.- type backend:
Literal['hdf5']- param backend:
Backend used for backed output when
fileis provided.- type n_jobs:
- param n_jobs:
Number of parallel jobs to use when
adatais a list. Ifn_jobs=-1, all CPUs are used.- returns:
If
inplace=False, returns an annotated data matrix whose rows are cells and columns are genomic bins. Iffile=None, returns an in-memory AnnData object; otherwise returns a backed AnnData object. Ifinplace=True, returnsNoneand updatesadata.Xin place.- rtype:
AnnData|None
See also
Examples
>>> import snapatac2 as snap >>> fragments = snap.datasets.pbmc500(downsample=True) >>> data = snap.pp.import_fragments( ... fragments, ... chrom_sizes=snap.genome.hg38, ... sorted_by_barcode=False, ... ) >>> snap.pp.add_tile_matrix( ... data, ... bin_size=500, ... exclude_chroms=["chrM", "chrY"], ... ) >>> print(data.shape)