snapatac2.tl.init_network_from_annotation#

snapatac2.tl.init_network_from_annotation(regions, anno_file, upstream=250000, downstream=250000, id_type='gene_name', coding_gene_only=True)[source]#

Build a region-to-gene network from gene annotations.

Use this function to connect candidate cis-regulatory elements to genes when the regions fall within an annotation-derived regulatory domain around each transcription start site.

Anti-Patterns#

  • Do NOT pass regions from a genome build different from anno_file.

  • Do NOT assume edges are functional regulatory links; they encode genomic proximity only until scores are added.

param regions:

Candidate regulatory regions in chrom:start-end format.

type regions:

list[str]

param anno_file:

GFF/GTF annotation file, or a Genome object containing the annotation.

type anno_file:

Path | Genome

param upstream:

Bases upstream of each transcription start site included in the regulatory domain.

type upstream:

int

param downstream:

Bases downstream of each transcription start site included in the regulatory domain.

type downstream:

int

param id_type:

Annotation identifier stored on gene nodes.

type id_type:

Literal['gene_name', 'gene_id', 'transcript_id']

param coding_gene_only:

If True, retain only protein-coding genes.

type coding_gene_only:

bool

returns:

Directed graph whose region nodes point to nearby gene nodes.

rtype:

PyDiGraph

Examples

>>> import snapatac2 as snap
>>> regions = ["chr1:10000-10500", "chr1:20000-20500"]
>>> network = snap.tl.init_network_from_annotation(regions, snap.genome.hg38)
>>> network.num_nodes() >= 0
True