snapatac2.pp.import_values#

snapatac2.pp.import_values(input_dir, chrom_sizes, *, file=None, whitelist=None, chunk_size=200, backend='hdf5')[source]#

Import base-pair values into an AnnData object.

Use this function to load per-cell, base-resolution values from a directory of input files, such as whole-genome bisulfite sequencing values, into an AnnData object.

Anti-Patterns#

Do NOT pass an empty chromosome-size mapping; reference sequence sizes are required to index genomic positions.

type input_dir:: Path
param input_dir:: Directory containing input files, with one file per cell.
type chrom_sizes:: Genome | dict[str, int]
param chrom_sizes:: A Genome object or a dictionary containing chromosome sizes, for example, {"chr1": 2393, "chr2": 2344, ...}.
type file:: Path | None
param file:: File name of the output h5ad file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used.
type whitelist:: Path | list[str] | None
param whitelist:: File name or a list of barcodes. If it is a file name, each line must contain a valid barcode. When provided, only barcodes in the whitelist will be retained.
type chunk_size:: int
param chunk_size:: Increasing the chunk_size speeds up I/O but uses more memory.
type backend:: Literal['hdf5']
param backend:: The backend.
returns:: An annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. If file=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.
rtype:: AnnData

Examples

>>> from pathlib import Path
>>> import tempfile
>>> import snapatac2 as snap
>>> tmp = tempfile.TemporaryDirectory()
>>> input_dir = Path(tmp.name)
>>> _ = (input_dir / "cell1.tsv").write_text("chrom\tpos\tmethyl\tunmethyl\nchr1\t10\t3\t7\n")
>>> data = snap.pp.import_values(input_dir, chrom_sizes={"chr1": 1000})
>>> data.n_obs
1
>>> tmp.cleanup()