Gene quantification
rustar-aligner can produce two flavours of quantification output during alignment, controlled by --quantMode. Both require a GTF file.
Gene-level counts (GeneCounts)
Section titled “Gene-level counts (GeneCounts)”Produces a <prefix>ReadsPerGene.out.tab file with one row per gene and four columns: gene ID, then read counts for unstranded, forward-stranded, and reverse-stranded protocols. Output is identical to STAR’s, so any downstream tool that already consumes STAR ReadsPerGene.out.tab works unchanged (e.g. DESeq2, edgeR, MultiQC).
rustar-aligner \ --genomeDir /path/to/genome_index \ --readFilesIn reads_1.fq.gz reads_2.fq.gz \ --readFilesCommand zcat \ --sjdbGTFfile gencode.v45.gtf \ --quantMode GeneCounts \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix sample_Output format
Section titled “Output format”N_unmapped <count> <count> <count>N_multimapping <count> <count> <count>N_noFeature <count> <count> <count>N_ambiguous <count> <count> <count>ENSG00000000003.15 <count> <count> <count>ENSG00000000005.6 <count> <count> <count>...The first four rows are summary categories (matching STAR / htseq-count semantics). Subsequent rows are per-gene counts. Pick the column that matches your library protocol:
| Column | Strand assumption |
|---|---|
| 2 | unstranded |
| 3 | forward / htseq-count -s yes |
| 4 | reverse / htseq-count -s reverse |
If you don’t know your library’s strandedness, all three columns help — you can compare them and infer.
Transcriptome-coordinate SAM (TranscriptomeSAM)
Section titled “Transcriptome-coordinate SAM (TranscriptomeSAM)”Produces a <prefix>Aligned.toTranscriptome.out.bam file with reads mapped to transcriptome coordinates (one record per matching transcript) instead of genome coordinates. This is the format consumed by RSEM and similar transcript-level quantifiers.
rustar-aligner \ --genomeDir /path/to/genome_index \ --readFilesIn reads_1.fq.gz reads_2.fq.gz \ --readFilesCommand zcat \ --sjdbGTFfile gencode.v45.gtf \ --quantMode TranscriptomeSAM \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix sample_Output filtering variants
Section titled “Output filtering variants”--quantTranscriptomeSAMoutput controls what’s allowed in the transcriptome BAM:
| Value | Meaning |
|---|---|
BanSingleEnd_BanIndels_ExtendSoftclip | Default. RSEM-compatible: drop unpaired records, drop reads with indels, extend soft-clips into matches. |
BanSingleEnd_ExtendSoftclip | Keep indels, still extend soft-clips. |
BanSingleEnd | Keep indels and soft-clips as-is. |
Pick the variant that matches your downstream tool’s expectations. RSEM defaults to BanSingleEnd_BanIndels_ExtendSoftclip.
Combining both
Section titled “Combining both”You can request both modes in the same run:
--quantMode GeneCounts TranscriptomeSAMThis emits ReadsPerGene.out.tab and Aligned.toTranscriptome.out.bam in addition to the normal alignment output.
Index-time vs alignment-time
Section titled “Index-time vs alignment-time”For best speed, supply --sjdbGTFfile at --runMode genomeGenerate time and the transcript-level data structures get persisted into the genome directory. Then at alignment time you only need --quantMode TranscriptomeSAM (or GeneCounts); rustar-aligner reuses the persisted annotations.
If you supply --sjdbGTFfile only at alignment time, transcript info is rebuilt on the fly each run. This works but adds startup cost.
Strandedness summary
Section titled “Strandedness summary”For --quantMode GeneCounts, all three strand columns are written regardless of library protocol — pick the right one downstream.
For --quantMode TranscriptomeSAM, the orientation is inferred from the transcript annotation in the GTF; no explicit strand parameter is needed.