snapatac2.pp.add_tile_matrix#

snapatac2.pp.add_tile_matrix(adata, *, bin_size=500, inplace=True, chunk_size=500, exclude_chroms=['chrM', 'chrY', 'M', 'Y'], min_frag_size=None, max_frag_size=None, counting_strategy='paired-insertion', value_type='target', summary_type='sum', file=None, backend='hdf5', n_jobs=8)[source]#

Generate a cell-by-genomic-bin count matrix.

Use this function after import_fragments or import_values to summarize per-cell fragments or values into fixed-width genomic bins. Execute this step before feature selection, dimensionality reduction, clustering, or other workflows that require a count matrix in .X.

Anti-Patterns#

  • Do NOT call this function on an AnnData object that was not created by import_fragments or import_values. The input must contain the internal fragment or value storage used by SnapATAC2.

  • Do NOT use inplace=False with a list of AnnData objects. Lists are only supported when inplace=True, where each object is updated in parallel.

  • Do NOT expect file or backend to change output storage when inplace=True; these arguments are only used when inplace=False.

  • Do NOT pass exclude_chroms=None unless mitochondrial, sex, and other special chromosomes should be retained in the tile matrix.

type adata:

AnnData | list[AnnData]

param adata:

The imported AnnData object, or a list of imported AnnData objects when inplace=True. Each object must contain fragment data from import_fragments or value data from import_values.

type bin_size:

int

param bin_size:

The width, in base pairs, of each consecutive genomic bin.

type inplace:

bool

param inplace:

If True, store the tile matrix in adata.X and return None. If False, return a new AnnData object containing the tile matrix.

type chunk_size:

int

param chunk_size:

Number of bins processed per chunk. Increase this value to improve I/O throughput when memory is sufficient; decrease it to reduce peak memory use.

type exclude_chroms:

list[str] | str | None

param exclude_chroms:

Chromosome names to exclude before binning. By default, mitochondrial and Y chromosomes are excluded ("chrM", "chrY", "M", "Y").

type min_frag_size:

int | None

param min_frag_size:

Minimum fragment size to include. Fragments shorter than this threshold are ignored. Use None to disable the lower bound.

type max_frag_size:

int | None

param max_frag_size:

Maximum fragment size to include. Fragments longer than this threshold are ignored. Use None to disable the upper bound.

type counting_strategy:

Literal['fragment', 'insertion', 'paired-insertion']

param counting_strategy:

The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.

type value_type:

Literal['target', 'total', 'fraction']

param value_type:

The value to summarize from .obsm['_values'] when data was imported with import_values. It must be one of “target”, “total”, or “fraction”. “target” means the number of records with positive measurements, e.g. methylated bases. “total” means the total number of measurements, e.g. methylated plus unmethylated bases. “fraction” means the fraction of records with positive measurements.

type summary_type:

Literal['sum', 'mean']

param summary_type:

The aggregation to use when multiple values are found in a bin. This parameter is only used when .obsm['_values'] exists, which is created by import_values. It must be “sum” or “mean”.

type file:

Path | None

param file:

Output file for the returned AnnData object when inplace=False. If provided, the result is stored as backed AnnData. If None, the result is returned in memory. This argument has no effect when inplace=True.

type backend:

Literal['hdf5']

param backend:

Backend used for backed output when file is provided.

type n_jobs:

int

param n_jobs:

Number of parallel jobs to use when adata is a list. If n_jobs=-1, all CPUs are used.

returns:

If inplace=False, returns an annotated data matrix whose rows are cells and columns are genomic bins. If file=None, returns an in-memory AnnData object; otherwise returns a backed AnnData object. If inplace=True, returns None and updates adata.X in place.

rtype:

AnnData | None

Examples

>>> import snapatac2 as snap
>>> fragments = snap.datasets.pbmc500(downsample=True)
>>> data = snap.pp.import_fragments(
...     fragments,
...     chrom_sizes=snap.genome.hg38,
...     sorted_by_barcode=False,
... )
>>> snap.pp.add_tile_matrix(
...     data,
...     bin_size=500,
...     exclude_chroms=["chrM", "chrY"],
... )
>>> print(data.shape)