Skip to content

Chimeric detection

A chimeric alignment is a read whose two halves map to different genomic locations — different chromosomes, the same chromosome with an unrealistically large gap, or the same chromosome but on opposite strands. These are the candidate evidence for gene fusions, large-scale structural variants, and circular RNA back-splicing.

rustar-aligner implements STAR’s chimeric detection pipeline with four tiers, all of which run automatically when chimeric detection is enabled.

Chimeric detection is off by default (--chimSegmentMin 0). Enable it by setting a minimum chimeric segment length — STAR’s recommended starting value is 12:

Terminal window
rustar-aligner \
--genomeDir /path/to/genome_index \
--readFilesIn reads_1.fq.gz reads_2.fq.gz \
--readFilesCommand zcat \
--chimSegmentMin 12 \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix sample_

Higher values (e.g. 20) produce fewer, more confident calls; lower values (e.g. 10) are more sensitive but noisier.

The 4-tier pipeline runs in this order, stopping as soon as a chimeric pair is found:

  1. Tier 1 — transcript-pair search. Searches the read’s existing transcript pool for two segments that together cover most of the read but map to incompatible locations.
  2. Tier 2 — multi-cluster. When the seed pool produces multiple distinct alignment clusters, evaluates each pair as a candidate chimeric.
  3. Tier 1b — soft-clip re-mapping. Takes the soft-clipped tail of the primary alignment and re-seeds it against the genome. Recovers chimeric pairs where the original aligner only kept one half.
  4. Tier 3 — residual outer re-seeding. For reads where Tier 1/1b/2 has already found a chimeric pair, re-seeds the remaining uncovered regions of the read (before the donor / after the acceptor). Enables 3-way detection — gene fusions involving three loci, e.g. a complex rearrangement where two breakpoints are present in a single read.

For paired-end data, additional inter-mate detection runs: if mate 1 maps confidently to one location and mate 2 to another that’s incompatible with a normal proper pair (different chromosomes, same strand, or >1 Mb gap), the pair is reported as chimeric.

Set --chimOutType to control the output. Multiple values are allowed.

Terminal window
--chimOutType Junctions

Writes a <prefix>Chimeric.out.junction file with one row per chimeric junction. The 14-column format matches STAR’s; tools like Arriba and STAR-Fusion consume it directly.

Terminal window
--chimOutType WithinBAM

Embeds the chimeric segments as supplementary alignment records (FLAG 0x800) in the primary BAM, with SA tags linking the donor and acceptor halves. This is the format expected by tools that process chimeric BAM directly (e.g. for fusion calling on already-sorted BAMs).

Terminal window
--chimOutType Junctions WithinBAM

Writes both the junction file and the supplementary BAM records. Useful when downstream tools have different format requirements.

The most useful chimeric parameters:

  • --chimSegmentMin — minimum chimeric segment length (also enables/disables detection).
  • --chimScoreMin — minimum total chimeric alignment score. Default 0.
  • --chimScoreSeparation — minimum score gap between the chosen chimeric pair and the next-best alternative. Default 10.
  • --chimJunctionOverhangMin — minimum bases on each side of the chimeric junction. Default 20.
  • --chimMainSegmentMultNmax — main segment can multimap up to this many loci. Default 10.
  • --chimScoreJunctionNonGTAG — score penalty for non-canonical chimeric junctions. Default -1.

See the CLI parameters reference for the rest.

The Chimeric.out.junction file has 14 tab-separated columns (STAR-compatible):

#ColumnMeaning
1chr_donorADonor (left segment) chromosome
2brkpt_donorADonor breakpoint position
3strand_donorADonor strand
4chr_acceptorBAcceptor (right segment) chromosome
5brkpt_acceptorBAcceptor breakpoint position
6strand_acceptorBAcceptor strand
7junction_type-1 (encompassing PE) / 0 (non-canonical) / 1 (GT/AG) / 2 (CT/AC)
8repeat_left_lenALength of repeat to the left
9repeat_right_lenBLength of repeat to the right
10read_nameSource read name
11start_alnADonor start position on the read
12cigar_alnADonor CIGAR
13start_alnBAcceptor start position on the read
14cigar_alnBAcceptor CIGAR