snapatac2.datasets.pbmc500#

snapatac2.datasets.pbmc500(type='fragment', downsample=False)[source]#

Fetch the 10x Genomics 500 PBMC scATAC-seq example dataset.

Use this helper to download and cache the fragment, BAM, or FASTQ files for a small PBMC dataset suitable for tutorials and smoke tests. Set the SNAP_DATA_DIR environment variable before calling this function to control where downloaded files are cached.

Anti-Patterns#

  • Do NOT use the default full fragment file for fast examples; pass downsample=True when a small fragment file is sufficient.

  • Do NOT set downsample=True with type="bam" or type="fastq"; the downsampled file is only available for type="fragment".

param type:

File type to fetch. Use “fragment” for a fragments TSV.GZ file, “bam” for the position-sorted BAM file, or “fastq” for the extracted FASTQ files from the downloaded archive.

type type:

Literal['fastq', 'bam', 'fragment']

param downsample:

If True and type="fragment", fetch the smaller downsampled fragments file instead of the full fragments file.

type downsample:

bool

returns:

Path to the requested fragment or BAM file. For type="fastq", returns a list of paths to the extracted FASTQ files.

rtype:

Path | list[Path]

Examples

>>> import snapatac2 as snap
>>> fragment_file = snap.datasets.pbmc500(downsample=True)
>>> fragment_file.name
'atac_pbmc_500_downsample.tsv.gz'