snapatac2.pp.import_contacts#

snapatac2.pp.import_contacts(contact_file, chrom_sizes, *, file=None, sorted_by_barcode=True, bin_size=500000, chunk_size=200, tempdir=None, backend='hdf5')[source]#

Import chromatin contacts into an AnnData object.

Use this function to load single-cell chromatin-contact records and bin them into fixed-width genomic intervals. The result can be kept in memory or written to a backed h5ad file.

Anti-Patterns#

  • Do NOT set sorted_by_barcode=True for unsorted contact files; set it to False so this function sorts them first.

type contact_file:

Path

param contact_file:

Path to the contact file.

type file:

Path | None

param file:

File name of the output h5ad file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used.

type chrom_sizes:

Genome | dict[str, int]

param chrom_sizes:

A Genome object or a dictionary containing chromosome sizes, for example, {"chr1": 2393, "chr2": 2344, ...}.

type sorted_by_barcode:

bool

param sorted_by_barcode:

Whether the contact file has been sorted by cell barcodes.

type bin_size:

int

param bin_size:

The size of consecutive genomic regions used to record the counts.

type chunk_size:

int

param chunk_size:

Increasing the chunk_size speeds up I/O but uses more memory.

type tempdir:

Path | None

param tempdir:

Location to store temporary files. If None, system temporary directory will be used.

type backend:

Literal['hdf5']

param backend:

The backend.

returns:

An annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. If file=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.

rtype:

AnnData

Examples

>>> from pathlib import Path
>>> import tempfile
>>> import snapatac2 as snap
>>> tmp = tempfile.TemporaryDirectory()
>>> contact_file = Path(tmp.name) / "contacts.tsv"
>>> _ = contact_file.write_text("cell1\tchr1\t10\tchr1\t40\t1\n")
>>> data = snap.pp.import_contacts(contact_file, chrom_sizes={"chr1": 1000})
>>> data.n_obs
1
>>> tmp.cleanup()