snapatac2.pp.import_values#

snapatac2.pp.import_values(input_dir, chrom_sizes, *, file=None, whitelist=None, chunk_size=200, backend='hdf5')[source]#

Import base-pair values into an AnnData object.

Use this function to load per-cell, base-resolution values from a directory of input files, such as whole-genome bisulfite sequencing values, into an AnnData object.

Anti-Patterns#

  • Do NOT pass an empty chromosome-size mapping; reference sequence sizes are required to index genomic positions.

type input_dir:

Path

param input_dir:

Directory containing input files, with one file per cell.

type chrom_sizes:

Genome | dict[str, int]

param chrom_sizes:

A Genome object or a dictionary containing chromosome sizes, for example, {"chr1": 2393, "chr2": 2344, ...}.

type file:

Path | None

param file:

File name of the output h5ad file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used.

type whitelist:

Path | list[str] | None

param whitelist:

File name or a list of barcodes. If it is a file name, each line must contain a valid barcode. When provided, only barcodes in the whitelist will be retained.

type chunk_size:

int

param chunk_size:

Increasing the chunk_size speeds up I/O but uses more memory.

type backend:

Literal['hdf5']

param backend:

The backend.

returns:

An annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. If file=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.

rtype:

AnnData

Examples

>>> from pathlib import Path
>>> import tempfile
>>> import snapatac2 as snap
>>> tmp = tempfile.TemporaryDirectory()
>>> input_dir = Path(tmp.name)
>>> _ = (input_dir / "cell1.tsv").write_text("chrom\tpos\tmethyl\tunmethyl\nchr1\t10\t3\t7\n")
>>> data = snap.pp.import_values(input_dir, chrom_sizes={"chr1": 1000})
>>> data.n_obs
1
>>> tmp.cleanup()