GeMoRNA: Difference between revisions
(Created blank page) |
No edit summary |
||
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
GeMoRNA reconstructs genes and transcript models from mapped RNA-seq reads (in coordinate-sorted BAM format) and reports these in GFF format. | |||
It is intended as a companion for the homology-based gene prediction program [[GeMoMa]]. | |||
In a typical workflow, predictions of transcript models may be obtained from GeMoRNA for a collection of BAM files individually and subsequently merged using the [[GeMoMa]] Annotation Filter (GAF). Optionally, homology-based gene prediction may be performed using [[GeMoMa]] and the resulting GFF files may be merged using the [[#Merge|Merge]] tool of GeMoRNA. | |||
== Command line tool == | |||
''GeMoRNA is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.'' | |||
GeMoRNA and auxiliary tools are packaged in one [http://www.jstacs.de/downloads/GeMoRNA-1.0.jar runnable JAR] that may be run from the command line with | |||
java -jar GeMoRNA-1.0.jar | |||
which lists the tools available and usage information | |||
Available tools: | |||
gemorna - GeMoRNA | |||
predictCDS - Predict CDS from GFF | |||
GAF - GeMoMa Annotation Filter | |||
Analyzer - Analyzer | |||
merge - Merge | |||
Syntax: java -jar GeMoRNA-1.0.jar <toolname> [<parameter=value> ...] | |||
Further info about the tools is given with | |||
java -jar GeMoRNA-1.0.jar <toolname> info | |||
For tests of individual tools: | |||
java -jar GeMoRNA-1.0.jar <toolname> test [<verbose>] | |||
Tool parameters are listed with | |||
java -jar GeMoRNA-1.0.jar <toolname> | |||
You get a list of the tool parameters by calling GeMoRNA-1.0.jar with the corresponding tool name, e.g., | |||
java -jar GeMoRNA-1.0.jar gemorna | |||
The meaning of the individual tool parameters is described below. | |||
For convenience, we also include the [[GeMoMa]] tools Analyzer and GAF. | |||
== Source code == | |||
The source code of GeMoRNA is available from the [https://github.com/Jstacs/Jstacs/tree/master/projects/gemorna Jstacs GitHub repository]. | |||
== GeMoRNA == | |||
Prediction of transcript models using GeMoRNA. | |||
''GeMoRNA'' may be called with | |||
java -jar GeMoRNA-1.0.jar gemorna | |||
and has the following parameters | |||
<table border=0 cellpadding=10 align="center" width="100%"> | |||
<tr> | |||
<td>name</td> | |||
<td>comment</td> | |||
<td>type</td> | |||
</tr> | |||
<tr><td colspan=3><hr></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">g</font></td> | |||
<td>Genome (Genome sequence as FastA, type = fa,fna,fasta)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">m</font></td> | |||
<td>Mapped reads (Mapped Reads in BAM format, coordinate sorted, type = bam)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">s</font></td> | |||
<td>Stranded (Library strandedness, range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED)</td> | |||
<td style="width:100px;">STRING</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">l</font></td> | |||
<td>Longest intron length (Length of the longest intron reported, default = 100000)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">sil</font></td> | |||
<td>Shortest intron length (Length of the shortest intron considered, default = 10)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">lr</font></td> | |||
<td>Long reads (Long-read mode, default = false)</td> | |||
<td style="width:100px;">BOOLEAN</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mnor</font></td> | |||
<td>Minimum number of reads (Minimum number of reads required for an edge in the read graph, default = 1.0)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mfor</font></td> | |||
<td>Minimum fraction of reads (Minimum fraction of reads relative to adjacent exons that must support an intron in the enumeration, default = 0.01)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mnoir</font></td> | |||
<td>Minimum number of intron reads (Minimum number of reads required for an intron, default = 1.0)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mfoir</font></td> | |||
<td>Minimum fraction of intron reads (Minimum fraction of reads relative to adjacent exons for an intron to be considered, default = 0.01)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">p</font></td> | |||
<td>Percent explained (Percent of abundance that must be explained by transcript models after quantification, default = 0.9)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mrpg</font></td> | |||
<td>Minimum reads per gene (Minimum abundance required for a gene to be reported, default = 40.0)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mrpt</font></td> | |||
<td>Minimum reads per transcript (Minimum abundance required for a transcript to be reported, default = 20.0)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">pa</font></td> | |||
<td>Percent abundance (Minimum relative abundance required for a transcript to be reported, default = 0.05)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">sf</font></td> | |||
<td>Successive fraction (Factor of the drop in abundance between successive transcript models, default = 20.0)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mrl</font></td> | |||
<td>Maximum region length (Maximum length of a region considered before it is split, default = 750000)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mfgl</font></td> | |||
<td>Maximum filled gap length (Maximum length of a gap filled by dummy reads, default = 50)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">q</font></td> | |||
<td>Quality filter (Minimum mapping quality required for a read to be considered, default = 40)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mpl</font></td> | |||
<td>Minimum protein length (Minimum length of protein in AA, default = 70)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">outdir</font></td> | |||
<td>The output directory, defaults to the current working directory (.)</td> | |||
<td>STRING</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">threads</font></td> | |||
<td>The number of threads used for the tool, defaults to 1</td> | |||
<td>INT</td> | |||
</tr> | |||
</table> | |||
'''Example:''' | |||
java -jar GeMoRNA-1.0.jar gemorna g=<Genome> m=<Mapped_reads> | |||
== Predict CDS from GFF == | |||
Prediction of CDSs using the longest-ORF heuristic based on an existing GFF or GTF file. | |||
''Predict CDS from GFF'' may be called with | |||
java -jar GeMoRNA-1.0.jar predictCDS | |||
and has the following parameters | |||
<table border=0 cellpadding=10 align="center" width="100%"> | |||
<tr> | |||
<td>name</td> | |||
<td>comment</td> | |||
<td>type</td> | |||
</tr> | |||
<tr><td colspan=3><hr></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">g</font></td> | |||
<td>Genome (Genome sequence as FastA, type = fa,fna.fasta)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">p</font></td> | |||
<td>predicted annotation ("GFF or GTF file containing the predicted annotation", type = gff,gff3,gff.gz,gff3.gz,gtf,gtf.gz)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">m</font></td> | |||
<td>Minimum protein length (Minimum length of protein in AA, default = 70)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">outdir</font></td> | |||
<td>The output directory, defaults to the current working directory (.)</td> | |||
<td>STRING</td> | |||
</tr> | |||
</table> | |||
'''Example:''' | |||
java -jar GeMoRNA-1.0.jar predictCDS g=<Genome> p=<predicted_annotation> | |||
== Merge == | |||
Merging GeMoRNA and GeMoMa predictions. | |||
''Merge'' may be called with | |||
java -jar GeMoRNA-1.0.jar merge | |||
and has the following parameters | |||
<table border=0 cellpadding=10 align="center" width="100%"> | |||
<tr> | |||
<td>name</td> | |||
<td>comment</td> | |||
<td>type</td> | |||
</tr> | |||
<tr><td colspan=3><hr></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">g</font></td> | |||
<td>GeMoMa (GeMoMa predictions, type = gff,gff3)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">GeMoRNA</font></td> | |||
<td>GeMoRNA (GeMoRNA predictions, type = gff,gff3)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">m</font></td> | |||
<td>Mode (, range={intersect, union, intermediate, annotate}, default = intersect)</td> | |||
<td style="width:100px;">STRING</td></tr> | |||
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%"> | |||
<tr><td colspan=3><b>No parameters for selection "intersect"</b></td></tr> | |||
<tr><td colspan=3><b>No parameters for selection "union"</b></td></tr> | |||
<tr><td colspan=3><b>Parameters for selection "intermediate":</b></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">GeMoMa-strict</font></td> | |||
<td>GeMoMa-strict (GeMoMa predictions with strict settings, type = gff,gff3)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">GeMoRNA-strict</font></td> | |||
<td>GeMoRNA-strict (GeMoRNA predictions with strict settings, type = gff,gff3)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr><td colspan=3><b>Parameters for selection "annotate":</b></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">GeMoMa-strict</font></td> | |||
<td>GeMoMa-strict (GeMoMa predictions with strict settings, type = gff,gff3)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">GeMoRNA-strict</font></td> | |||
<td>GeMoRNA-strict (GeMoRNA predictions with strict settings, type = gff,gff3)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
</table></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">outdir</font></td> | |||
<td>The output directory, defaults to the current working directory (.)</td> | |||
<td>STRING</td> | |||
</tr> | |||
</table> | |||
'''Example:''' | |||
java -jar GeMoRNA-1.0.jar merge g=<GeMoMa> GeMoRNA=<GeMoRNA> |
Latest revision as of 16:34, 8 November 2024
GeMoRNA reconstructs genes and transcript models from mapped RNA-seq reads (in coordinate-sorted BAM format) and reports these in GFF format.
It is intended as a companion for the homology-based gene prediction program GeMoMa.
In a typical workflow, predictions of transcript models may be obtained from GeMoRNA for a collection of BAM files individually and subsequently merged using the GeMoMa Annotation Filter (GAF). Optionally, homology-based gene prediction may be performed using GeMoMa and the resulting GFF files may be merged using the Merge tool of GeMoRNA.
Command line tool
GeMoRNA is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
GeMoRNA and auxiliary tools are packaged in one runnable JAR that may be run from the command line with
java -jar GeMoRNA-1.0.jar
which lists the tools available and usage information
Available tools: gemorna - GeMoRNA predictCDS - Predict CDS from GFF GAF - GeMoMa Annotation Filter Analyzer - Analyzer merge - Merge Syntax: java -jar GeMoRNA-1.0.jar <toolname> [<parameter=value> ...] Further info about the tools is given with java -jar GeMoRNA-1.0.jar <toolname> info For tests of individual tools: java -jar GeMoRNA-1.0.jar <toolname> test [<verbose>] Tool parameters are listed with java -jar GeMoRNA-1.0.jar <toolname>
You get a list of the tool parameters by calling GeMoRNA-1.0.jar with the corresponding tool name, e.g.,
java -jar GeMoRNA-1.0.jar gemorna
The meaning of the individual tool parameters is described below. For convenience, we also include the GeMoMa tools Analyzer and GAF.
Source code
The source code of GeMoRNA is available from the Jstacs GitHub repository.
GeMoRNA
Prediction of transcript models using GeMoRNA.
GeMoRNA may be called with
java -jar GeMoRNA-1.0.jar gemorna
and has the following parameters
name | comment | type |
g | Genome (Genome sequence as FastA, type = fa,fna,fasta) | FILE |
m | Mapped reads (Mapped Reads in BAM format, coordinate sorted, type = bam) | FILE |
s | Stranded (Library strandedness, range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED) | STRING |
l | Longest intron length (Length of the longest intron reported, default = 100000) | INT |
sil | Shortest intron length (Length of the shortest intron considered, default = 10) | INT |
lr | Long reads (Long-read mode, default = false) | BOOLEAN |
mnor | Minimum number of reads (Minimum number of reads required for an edge in the read graph, default = 1.0) | DOUBLE |
mfor | Minimum fraction of reads (Minimum fraction of reads relative to adjacent exons that must support an intron in the enumeration, default = 0.01) | DOUBLE |
mnoir | Minimum number of intron reads (Minimum number of reads required for an intron, default = 1.0) | DOUBLE |
mfoir | Minimum fraction of intron reads (Minimum fraction of reads relative to adjacent exons for an intron to be considered, default = 0.01) | DOUBLE |
p | Percent explained (Percent of abundance that must be explained by transcript models after quantification, default = 0.9) | DOUBLE |
mrpg | Minimum reads per gene (Minimum abundance required for a gene to be reported, default = 40.0) | DOUBLE |
mrpt | Minimum reads per transcript (Minimum abundance required for a transcript to be reported, default = 20.0) | DOUBLE |
pa | Percent abundance (Minimum relative abundance required for a transcript to be reported, default = 0.05) | DOUBLE |
sf | Successive fraction (Factor of the drop in abundance between successive transcript models, default = 20.0) | DOUBLE |
mrl | Maximum region length (Maximum length of a region considered before it is split, default = 750000) | INT |
mfgl | Maximum filled gap length (Maximum length of a gap filled by dummy reads, default = 50) | INT |
q | Quality filter (Minimum mapping quality required for a read to be considered, default = 40) | INT |
mpl | Minimum protein length (Minimum length of protein in AA, default = 70) | INT |
outdir | The output directory, defaults to the current working directory (.) | STRING |
threads | The number of threads used for the tool, defaults to 1 | INT |
Example:
java -jar GeMoRNA-1.0.jar gemorna g=<Genome> m=<Mapped_reads>
Predict CDS from GFF
Prediction of CDSs using the longest-ORF heuristic based on an existing GFF or GTF file.
Predict CDS from GFF may be called with
java -jar GeMoRNA-1.0.jar predictCDS
and has the following parameters
name | comment | type |
g | Genome (Genome sequence as FastA, type = fa,fna.fasta) | FILE |
p | predicted annotation ("GFF or GTF file containing the predicted annotation", type = gff,gff3,gff.gz,gff3.gz,gtf,gtf.gz) | FILE |
m | Minimum protein length (Minimum length of protein in AA, default = 70) | INT |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar GeMoRNA-1.0.jar predictCDS g=<Genome> p=<predicted_annotation>
Merge
Merging GeMoRNA and GeMoMa predictions.
Merge may be called with
java -jar GeMoRNA-1.0.jar merge
and has the following parameters
name | comment | type | ||||||||||||||||||||||||
g | GeMoMa (GeMoMa predictions, type = gff,gff3) | FILE | ||||||||||||||||||||||||
GeMoRNA | GeMoRNA (GeMoRNA predictions, type = gff,gff3) | FILE | ||||||||||||||||||||||||
m | Mode (, range={intersect, union, intermediate, annotate}, default = intersect) | STRING | ||||||||||||||||||||||||
| ||||||||||||||||||||||||||
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar GeMoRNA-1.0.jar merge g=<GeMoMa> GeMoRNA=<GeMoRNA>