GeMoRNA: Difference between revisions

From Jstacs
Jump to navigationJump to search
Line 172: Line 172:


  java -jar GeMoRNA-1.0.jar predictCDS g=<Genome> p=<predicted_annotation>
  java -jar GeMoRNA-1.0.jar predictCDS g=<Genome> p=<predicted_annotation>
=== Analyzer ===
This tools allows to compare true annotation with predicted annotation as it is frequently done in benchmark studies. Furthermore, it can return a detailed table comparing true annotation and predicted annotation which might help to identify systematical errors or biases in the predictions. Hence, this tool might help to detect weaknesses of the prediction algorithm.
True and predicted transcripts are evaluated based on nucleotide F1 measure. For each predicted transcript, the true transcript with highest nucleotide F1 measure is listed. A negative value in a F1 measure column indicates that there is a predicted transcript that matches the true transcript with a F1 measure value that is the absolute value of this entry, but there is another true transcript that matches this predicted transcript with an even better F1. Also true and predicted transcripts are listed that do not overlap with any transcript from the predicted and true annotation, respectively. The table contains the attributes of the true and the predicted annotation besides some additional columns allowing to easily filter interesting examples and to do statistics.
The evaluation can be based on CDS (default) or exon features. The tool also reports sensitivity and precision for the categories gene and transcript.
For more information please visit http://www.jstacs.de/index.php/GeMoMa
If you have any questions, comments or bugs, please check FAQs on our homepage, our github page https://github.com/Jstacs/Jstacs/labels/GeMoMa or contact jens.keilwagen@julius-kuehn.de
''Analyzer'' may be called with
java -jar GeMoRNA-1.0.jar Analyzer
and has the following parameters
<table border=0 cellpadding=10 align="center" width="100%">
<tr>
<td>name</td>
<td>comment</td>
<td>type</td>
</tr>
<tr><td colspan=3><hr></td></tr>
<tr style="vertical-align:top">
<td><font color="green">t</font></td>
<td>truth (the true annotation, type = gff,gff3,gtf,gff.gz,gff3.gz,gtf.gz)</td>
<td style="width:100px;">FILE</td>
</tr>
<tr><td colspan=3>The following parameter(s) can be used multiple times:</td></tr>
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
<tr style="vertical-align:top">
<td><font color="green">n</font></td>
<td>name (can be used to distinguish different predictions, OPTIONAL)</td>
<td style="width:100px;">STRING</td>
</tr>
<tr style="vertical-align:top">
<td><font color="green">p</font></td>
<td>predicted annotation (GFF/GTF file containing the predicted annotation, type = gff,gff3,gtf,gff.gz,gff3.gz,gtf.gz)</td>
<td style="width:100px;">FILE</td>
</tr>
</table>
</td></tr>
<tr style="vertical-align:top">
<td><font color="green">c</font></td>
<td>CDS (if true CDS features are used otherwise exon features, default = true)</td>
<td style="width:100px;">BOOLEAN</td>
</tr>
<tr style="vertical-align:top">
<td><font color="green">o</font></td>
<td>only introns (if true only intron borders (=splice sites) are evaluated, default = false)</td>
<td style="width:100px;">BOOLEAN</td>
</tr>
<tr style="vertical-align:top">
<td><font color="green">w</font></td>
<td>write (write detailed table comparing the true and the predicted annotation, range={NO, YES}, default = NO)</td>
<td style="width:100px;">STRING</td></tr>
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
<tr><td colspan=3><b>No parameters for selection &quot;NO&quot;</b></td></tr>
<tr><td colspan=3><b>Parameters for selection &quot;YES&quot;:</b></td></tr>
<tr style="vertical-align:top">
<td><font color="green">ca</font></td>
<td>common attributes (Only gff attributes of mRNAs are included in the result table, that can be found in the given portion of all mRNAs. Attributes and their portion are handled independently for truth and prediction. This parameter allows to choose between a more informative table or compact table., valid range = [0.0, 1.0], default = 0.5)</td>
<td style="width:100px;">DOUBLE</td>
</tr>
</table></td></tr>
<tr style="vertical-align:top">
<td><font color="green">r</font></td>
<td>reliable (additionally evaluate sensitivity for reliable transcripts, range={NO, YES}, default = NO)</td>
<td style="width:100px;">STRING</td></tr>
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
<tr><td colspan=3><b>No parameters for selection &quot;NO&quot;</b></td></tr>
<tr><td colspan=3><b>Parameters for selection &quot;YES&quot;:</b></td></tr>
<tr style="vertical-align:top">
<td><font color="green">f</font></td>
<td>filter (A filter for deciding which transcript from the truth are reliable or not. The filter is applied to the GFF attributes of the truth. You probably need to run AnnotationEvidence on the truth GFF. The default filter decides based on the completeness of the prediction (start=='M' and stop=='*'), no premature stop codons (nps==0), RNA-seq coverage (tpc==1) and intron evidence (isNaN(tie) or tie==1)., default = start=='M' and stop=='*' and nps==0 and (tpc==1 and (isNaN(tie) or tie==1)), OPTIONAL)</td>
<td style="width:100px;">STRING</td>
</tr>
</table></td></tr>
<tr style="vertical-align:top">
<td><font color="green">outdir</font></td>
<td>The output directory, defaults to the current working directory (.)</td>
<td>STRING</td>
</tr>
</table>
'''Example:'''
java -jar GeMoRNA-1.0.jar Analyzer t=&lt;truth&gt; p=&lt;predicted_annotation&gt;





Revision as of 16:19, 8 November 2024

Tools

GeMoRNA

GeMoRNA may be called with

java -jar GeMoRNA-1.0.jar gemorna

and has the following parameters

name comment type

g Genome (Genome sequence as FastA, type = fa,fna,fasta) FILE
m Mapped reads (Mapped Reads in BAM format, coordinate sorted, type = bam) FILE
s Stranded (Library strandedness, range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED) STRING
l Longest intron length (Length of the longest intron reported, default = 100000) INT
sil Shortest intron length (Length of the shortest intron considered, default = 10) INT
lr Long reads (Long-read mode, default = false) BOOLEAN
mnor Minimum number of reads (Minimum number of reads required for an edge in the read graph, default = 1.0) DOUBLE
mfor Minimum fraction of reads (Minimum fraction of reads relative to adjacent exons that must support an intron in the enumeration, default = 0.01) DOUBLE
mnoir Minimum number of intron reads (Minimum number of reads required for an intron, default = 1.0) DOUBLE
mfoir Minimum fraction of intron reads (Minimum fraction of reads relative to adjacent exons for an intron to be considered, default = 0.01) DOUBLE
p Percent explained (Percent of abundance that must be explained by transcript models after quantification, default = 0.9) DOUBLE
mrpg Minimum reads per gene (Minimum abundance required for a gene to be reported, default = 40.0) DOUBLE
mrpt Minimum reads per transcript (Minimum abundance required for a transcript to be reported, default = 20.0) DOUBLE
pa Percent abundance (Minimum relative abundance required for a transcript to be reported, default = 0.05) DOUBLE
sf Successive fraction (Factor of the drop in abundance between successive transcript models, default = 20.0) DOUBLE
mrl Maximum region length (Maximum length of a region considered before it is split, default = 750000) INT
mfgl Maximum filled gap length (Maximum length of a gap filled by dummy reads, default = 50) INT
q Quality filter (Minimum mapping quality required for a read to be considered, default = 40) INT
mpl Minimum protein length (Minimum length of protein in AA, default = 70) INT
outdir The output directory, defaults to the current working directory (.) STRING
threads The number of threads used for the tool, defaults to 1 INT

Example:

java -jar GeMoRNA-1.0.jar gemorna g=<Genome> m=<Mapped_reads>


Predict CDS from GFF

Predict CDS from GFF may be called with

java -jar GeMoRNA-1.0.jar predictCDS

and has the following parameters

name comment type

g Genome (Genome sequence as FastA, type = fa,fna.fasta) FILE
p predicted annotation ("GFF or GTF file containing the predicted annotation", type = gff,gff3,gff.gz,gff3.gz,gtf,gtf.gz) FILE
m Minimum protein length (Minimum length of protein in AA, default = 70) INT
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar GeMoRNA-1.0.jar predictCDS g=<Genome> p=<predicted_annotation>


Merge

Merge may be called with

java -jar GeMoRNA-1.0.jar merge

and has the following parameters

name comment type

g GeMoMa (GeMoMa predictions, type = gff,gff3) FILE
GeMoRNA GeMoRNA (GeMoRNA predictions, type = gff,gff3) FILE
m Mode (, range={intersect, union, intermediate, annotate}, default = intersect) STRING
No parameters for selection "intersect"
No parameters for selection "union"
Parameters for selection "intermediate":
GeMoMa-strict GeMoMa-strict (GeMoMa predictions with strict settings, type = gff,gff3) FILE
GeMoRNA-strict GeMoRNA-strict (GeMoRNA predictions with strict settings, type = gff,gff3) FILE
Parameters for selection "annotate":
GeMoMa-strict GeMoMa-strict (GeMoMa predictions with strict settings, type = gff,gff3) FILE
GeMoRNA-strict GeMoRNA-strict (GeMoRNA predictions with strict settings, type = gff,gff3) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar GeMoRNA-1.0.jar merge g=<GeMoMa> GeMoRNA=<GeMoRNA>