snapatac2.PyDNAMotifScanner.find#

PyDNAMotifScanner.find(seq, pvalue=1e-05, report_pvalue=False)#

Find motif occurrences in the given sequence above the specified p-value threshold.

Use this method when the positions and scores of forward-strand motif hits are needed. Set report_pvalue=True to also report log10 p-values estimated from the scanner’s score CDF. This method does not scan the reverse complement.

Anti-Patterns#

  • Do NOT use find when reverse-complement hits should be considered; use exist or scan the reverse complement explicitly.

  • Do NOT pass RNA sequences; use DNA letters A/C/G/T.

type seq:

str

param seq:

DNA sequence to scan.

type seq:

str

type pvalue:

float

param pvalue:

P-value threshold for reporting motif occurrences. Default is 1e-5.

type pvalue:

float

type report_pvalue:

bool

param report_pvalue:

Whether to report per-hit log10 p-values. Default is False.

type report_pvalue:

bool

returns:

A list of tuples where each tuple contains the position of the motif occurrence, the natural-log likelihood-ratio score, and optionally the log10 p-value.

rtype:

list[tuple[int, float, float | None]]

Notes

For long or information-rich motifs, returned hits will usually have reported p-values below the user-defined cutoff. For short or degenerate motifs, the best possible motif score may still have an estimated p-value larger than the requested cutoff. In this case, find may return exact or best-possible matches whose reported p-value is larger than pvalue. This behavior is intentional: the pvalue argument is used to derive a score cutoff, and the scanner keeps best-possible matches reachable even when they are not statistically significant under the motif score CDF. If strict p-value filtering is required, call find(..., report_pvalue=True) and filter returned hits by the reported log10 p-value.

Examples

>>> hits = scanner.find("ACGTACGT", pvalue=1e-5, report_pvalue=True)