STAR compatibility
rustar-aligner is a faithful port. The goal is byte-for-byte identical output to STAR for every read where the algorithm is deterministic, and provably equivalent output (same alignment set, different tie-break) for every read where it isn’t. This page is the long-form scoreboard.
The benchmark below uses 10,000 yeast RNA-seq reads (150 bp, ERR12389696), aligned with both tools using identical parameters and the same genome index.
Single-end summary
Section titled “Single-end summary”| Metric | rustar-aligner | STAR |
|---|---|---|
| Unique mapped | 82.6% | 82.6% |
| Multi-mapped | 7.4% | 7.4% |
| Total mapped | 90.0% | 90.0% |
| Position agreement (raw) | 96.5% | — |
| Position agreement (tie-adjusted) | 99.815% | — |
| Reads mapped only by STAR | 0 | — |
| Reads mapped only by rustar-aligner | 0 | — |
| CIGAR-only differences | 1 | — |
Tie-adjusted means: of the 313 raw disagreements, 299 are verified genuine ties — both tools find the same set of alignments, but pick different copies as primary because of differences in suffix-array iteration order or RNG-based tie-breaking. Excluding those ties, faithfulness is 8,611 / 8,627 = 99.815%.
Paired-end summary
Section titled “Paired-end summary”| Metric | rustar-aligner | STAR |
|---|---|---|
| Both mates mapped | 8,390 | 8,390 |
| Half-mapped pairs | 0 | 0 |
| Unmapped pairs | 0 | 0 |
| PE faithfulness (tie-adjusted) | 99.883% | — |
| MAPQ inflations | 0 | — |
| MAPQ deflations | 0 | — |
| NH tag differences | 0 | — |
| Proper-pair flag differences | 0 | — |
Tie-adjusted for paired-end: 16,284 / 16,306 mate alignments exactly match STAR (same position, CIGAR, MAPQ, proper-pair flag, NH tag). 475 differences are excluded as tie-breaking only (same MAPQ + same NH, different repeat copy chosen).
Index format
Section titled “Index format”The genome index format is identical. After Phase G3 (the SA tie-breaking fix), the suffix array for the yeast benchmark genome is byte-for-byte identical between STAR and rustar-aligner: 10,862 entry differences fixed → 0 remaining.
This means an index built with one tool is loadable by the other.
Where the differences come from
Section titled “Where the differences come from”There are three categories of remaining difference, in order of size:
1. Tie-breaking (the bulk: ~3% of reads)
Section titled “1. Tie-breaking (the bulk: ~3% of reads)”When two alignments have the same score, both tools pick a primary using a deterministic but different procedure:
- STAR uses a Mersenne Twister RNG (
mt19937) seeded by--runRNGseed. - rustar-aligner uses Rust’s
StdRng(ChaCha) seeded by--runRNGseed.
Both honour the seed and are reproducible across runs of the same tool — but the produced sequences differ between tools. So for ~3% of multi-mapped reads, primary vs. secondary flips. Total NH counts, AS scores, CIGARs, and the full alignment set are unaffected.
A subset (~100 of 313 SE diffs) involves “diff-chr ties”: the read maps to several copies in a multi-copy region (e.g. rDNA), and the two tools pick different copies. Same alignment quality, different one chosen.
This is what the “tie-adjusted” numbers above account for.
2. CIGAR placement in homopolymer runs (1 read in 10,000)
Section titled “2. CIGAR placement in homopolymer runs (1 read in 10,000)”ERR12389696.13573895: both tools align to XV:218357 MAPQ=255 AS=133, but rustar-aligner emits 100M1I45M4S (insertion at read position 100) while STAR emits 108M1I37M4S (insertion at 108). The 71-base seed is found at RC pos 29 (rustar-aligner) vs RC pos 37 (STAR) due to a different Lmapped chain path through a long homopolymer. Same diagonal, same score, different starting position → different insertion placement.
This is a seed-level tie. Real impact for downstream tools: effectively zero. To match STAR exactly we’d need to replicate STAR’s exact Lmapped chain, which is a high-effort, low-value fix.
3. PE-specific cases where rustar-aligner finds a better alignment
Section titled “3. PE-specific cases where rustar-aligner finds a better alignment”Four PE alignments have a higher AS in rustar-aligner than STAR. These are not bugs — they’re improvements:
ERR12389696.844151: rustar-aligner findsVIII:451791with 0 mismatches; STAR findsVII:1001391with 6 mismatches.ERR12389696.4972950: rustar-aligner correctly emits a spliced mate 2; STAR’s combined-window approach fails to stitch the spliced mate at the better location and emits unspliced.
In both, STAR’s combined-window seeding fails to reach the higher-scoring alignment. We’ve decided not to artificially regress these.
Faithfulness over time
Section titled “Faithfulness over time”The faithfulness numbers have been moving upwards as more STAR algorithm details have been replicated:
| Phase | PE tie-adjusted faithfulness |
|---|---|
| Pre-Phase F1 | 99.755% |
Phase F1 (--runRNGseed) | 99.755%* (RNG change reset baseline) |
| Phase G1 (junction-shift fix) | 99.865% |
| Phase G2 (recursion budget + overflow fix) | 99.883% |
* Phase F1 changed PE tie-breaking from SA-order to seeded StdRng, which shuffled which reads count as “tie-broken” without changing the underlying alignment quality.
Other compatibility checks
Section titled “Other compatibility checks”- Genome index files (
Genome,SA,SAindex,chrName.txt,chrStart.txt,chrNameLength.txt,sjdbList.fromGTF.out.tab,transcriptInfo.tab,exonInfo.tab): identical or equivalent. SJ.out.tab: matching format and contents for the yeast benchmark.Log.final.out: format matches STAR; MultiQC parses it without modification.Chimeric.out.junction: 14-column STAR-compatible format. Tools like Arriba and STAR-Fusion read it directly.ReadsPerGene.out.tab: 4-column STAR-compatible format.
What this means in practice
Section titled “What this means in practice”For the overwhelming majority of bioinformatics workflows — read counting, differential expression, splice junction analysis, fusion calling, MultiQC reports — rustar-aligner’s output is interchangeable with STAR’s. The exceptions are:
- Workflows that depend on the specific copy a multi-mapper landed on (rare; in those cases, you should be using the full alignment set anyway, not just the primary).
- Reproducing exact byte-equality of SAM output across STAR and rustar-aligner runs (not achievable today; the RNG difference is the dominant source).
If you find a divergence not described here, please open an issue — the project goal is to keep this list as short as possible.