snapatac2.PyDNAMotifScanner.find#
- PyDNAMotifScanner.find(seq, pvalue=1e-05, report_pvalue=False)#
Find motif occurrences in the given sequence above the specified p-value threshold.
Use this method when the positions and scores of forward-strand motif hits are needed. Set
report_pvalue=Trueto also report log10 p-values estimated from the scanner’s score CDF. This method does not scan the reverse complement.Anti-Patterns#
Do NOT use
findwhen reverse-complement hits should be considered; useexistor scan the reverse complement explicitly.Do NOT pass RNA sequences; use DNA letters A/C/G/T.
- type seq:
str
- param seq:
DNA sequence to scan.
- type seq:
str
- type pvalue:
float
- param pvalue:
P-value threshold for reporting motif occurrences. Default is 1e-5.
- type pvalue:
float
- type report_pvalue:
bool
- param report_pvalue:
Whether to report per-hit log10 p-values. Default is False.
- type report_pvalue:
bool
- returns:
A list of tuples where each tuple contains the position of the motif occurrence, the natural-log likelihood-ratio score, and optionally the log10 p-value.
- rtype:
list[tuple[int, float, float | None]]
Notes
For long or information-rich motifs, returned hits will usually have reported p-values below the user-defined cutoff. For short or degenerate motifs, the best possible motif score may still have an estimated p-value larger than the requested cutoff. In this case,
findmay return exact or best-possible matches whose reported p-value is larger thanpvalue. This behavior is intentional: thepvalueargument is used to derive a score cutoff, and the scanner keeps best-possible matches reachable even when they are not statistically significant under the motif score CDF. If strict p-value filtering is required, callfind(..., report_pvalue=True)and filter returned hits by the reported log10 p-value.Examples
>>> hits = scanner.find("ACGTACGT", pvalue=1e-5, report_pvalue=True)