snapatac2.pp.import_values#
- snapatac2.pp.import_values(input_dir, chrom_sizes, *, file=None, whitelist=None, chunk_size=200, backend='hdf5')[source]#
Import base-pair values into an AnnData object.
Use this function to load per-cell, base-resolution values from a directory of input files, such as whole-genome bisulfite sequencing values, into an AnnData object.
Anti-Patterns#
Do NOT pass an empty chromosome-size mapping; reference sequence sizes are required to index genomic positions.
- type input_dir:
- param input_dir:
Directory containing input files, with one file per cell.
- type chrom_sizes:
- param chrom_sizes:
A Genome object or a dictionary containing chromosome sizes, for example,
{"chr1": 2393, "chr2": 2344, ...}.- type file:
- param file:
File name of the output h5ad file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used.
- type whitelist:
- param whitelist:
File name or a list of barcodes. If it is a file name, each line must contain a valid barcode. When provided, only barcodes in the whitelist will be retained.
- type chunk_size:
- param chunk_size:
Increasing the chunk_size speeds up I/O but uses more memory.
- type backend:
Literal['hdf5']- param backend:
The backend.
- returns:
An annotated data matrix of shape
n_obsxn_vars. Rows correspond to cells and columns to regions. Iffile=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.- rtype:
AnnData
Examples
>>> from pathlib import Path >>> import tempfile >>> import snapatac2 as snap >>> tmp = tempfile.TemporaryDirectory() >>> input_dir = Path(tmp.name) >>> _ = (input_dir / "cell1.tsv").write_text("chrom\tpos\tmethyl\tunmethyl\nchr1\t10\t3\t7\n") >>> data = snap.pp.import_values(input_dir, chrom_sizes={"chr1": 1000}) >>> data.n_obs 1 >>> tmp.cleanup()