DerTALEv2: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''DerTALEv2''' filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box. | '''DerTALEv2''' filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box. | ||
As input, DerTALEv2 requires a list of target box predictions as generated by the PrediTALE tool, which is included in the DerTALEv2 JAR file. | As input, ''DerTALEv2'' requires a list of target box predictions as generated by the ''PrediTALE'' tool, which is included in the DerTALEv2 JAR file. | ||
For determining differentially expressed regions, DerTALEv2 also needs mapped RNA-seq data after ''Xanthomonas'' infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, DerTALEv2 also needs an index file with the same base name as the BAM file but additional extension | For determining differentially expressed regions, ''DerTALEv2'' also needs mapped RNA-seq data after ''Xanthomonas'' infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, ''DerTALEv2'' also needs an index file with the same base name as the BAM file but additional extension <code>.bai</code> (as generated by samtools). | ||
Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed | Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed and a threshold on the log (base 2) differential abundance (e.g., <code>1</code> for a two-fold induction). | ||
== Command line tool == | |||
''DerTALEv2 is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.'' | |||
DerTALEv2 and PrediTALE are packaged in one [http://www.jstacs.de/downloads/DerTALEv2.jar runnable JAR] that may be run from the command line with | |||
java -jar DerTALEv2 | |||
which lists the tools available and usage information | |||
Available tools: | |||
dertalev2 - DerTALEv2 | |||
preditale - PrediTALE | |||
Syntax: java -jar DerTALEv2.jar <toolname> [<parameter=value> ...] | |||
Further info about the tools is given with | |||
java -jar DerTALEv2.jar <toolname> info | |||
For tests of individual tools: | |||
java -jar DerTALEv2.jar <toolname> test [<verbose>] | |||
Tool parameters are listed with | |||
java -jar DerTALEv2.jar <toolname> | |||
You get a list of the tool parameters by calling DerTALEv2.jar with the corresponding tool name, e.g., | |||
java -jar DerTALEv2.jar dertalev2 | |||
The meaning of the individual tool parameters is described below. | |||
== Tool parameters == | |||
=== DerTALEv2 === | |||
'''DerTALEv2''' filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box. | |||
If you experience problems using ''DerTALEv2'', please [mailto:grau@informatik.uni-halle.de contact] us. | |||
''DerTALEv2'' may be called with | |||
java -jar DerTALEv2.jar dertalev2 | |||
and has the following parameters | |||
<table border=0 cellpadding=10 align="center" width="100%"> | |||
<tr> | |||
<td>name</td> | |||
<td>comment</td> | |||
<td>type</td> | |||
</tr> | |||
<tr><td colspan=3><hr></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">p</font></td> | |||
<td>Predictions (Predictions output file, type = tsv,tabular)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr><td colspan=3>The following parameter(s) can be used multiple times:</td></tr> | |||
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%"> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">t</font></td> | |||
<td>Treatment BAM (BAM file of mapped reads from treatment experiment. BAM file must have an index with additional extension .bai., type = bam)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
</table> | |||
</td></tr> | |||
<tr><td colspan=3>The following parameter(s) can be used multiple times:</td></tr> | |||
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%"> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">c</font></td> | |||
<td>Control BAM (BAM file of mapped reads from control experiment. BAM file must have an index with additional extension .bai., type = bam)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
</table> | |||
</td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">n</font></td> | |||
<td>Number of predictions (Number of (top) predictions considered, default = 100)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">r</font></td> | |||
<td>Region width (Number of bases around the predicted site, default = 500)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">Threshold</font></td> | |||
<td>Threshold (Threshold on the log differential abundance, default = 1.0)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">s</font></td> | |||
<td>Stranded (Defines whether the reads are stranded. In case of FR_FIRST_STRAND, the first read of a read pair or the only read in case of single-end data is assumed to be located on forward strand of the cDNA, i.e., reverse to the mRNA orientation. If you are using Illumina TruSeq you should use FR_FIRST_STRAND., range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED)</td> | |||
<td style="width:100px;">STRING</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">cc</font></td> | |||
<td>Coverage cutoff (Minimum amount of reads as coverage cuttoff., default = 10)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">rev</font></td> | |||
<td>Region elongation value (Amount of bases a region is elongated if coverage is above half of coverage cuttoff at start/end of region., default = 100)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">m</font></td> | |||
<td>Minimum length of candidate region (Minimum length of candidate region., default = 100)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">mcotcr</font></td> | |||
<td>Minimum coverage of the Candidate Region (Minimum coverage of the Candidate Region., default = 50)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">outdir</font></td> | |||
<td>The output directory, defaults to the current working directory (.)</td> | |||
<td>STRING</td> | |||
</tr> | |||
</table> | |||
'''Example:''' | |||
java -jar DerTALEv2.jar dertalev2 p=<Predictions> t=<Treatment_BAM1> t=<Treatment_BAM2> c=<Control_BAM1> c=<Control_BAM2> | |||
=== PrediTALE === | |||
'''PrediTALE''' predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE. | |||
As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data. | |||
The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general. | |||
TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the ''TALE Analysis'' tool of AnnoTALE. | |||
Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to <code>0</code> in case of genome-wide predictions. | |||
If you experience problems using ''PrediTALE'', please [mailto:grau@informatik.uni-halle.de contact] us. | |||
''PrediTALE'' may be called with | |||
java -jar DerTALEv2.jar preditale | |||
and has the following parameters | |||
<table border=0 cellpadding=10 align="center" width="100%"> | |||
<tr> | |||
<td>name</td> | |||
<td>comment</td> | |||
<td>type</td> | |||
</tr> | |||
<tr><td colspan=3><hr></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">s</font></td> | |||
<td>Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">b</font></td> | |||
<td>Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample)</td> | |||
<td style="width:100px;">STRING</td></tr> | |||
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%"> | |||
<tr><td colspan=3><b>No parameters for selection "sub-sample"</b></td></tr> | |||
<tr><td colspan=3><b>Parameters for selection "background sequences":</b></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">bs</font></td> | |||
<td>Background sequences (The sequences (e.g., a genome) for determining the prediction threshold, type = fa,fas,fasta)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
</table></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">t</font></td> | |||
<td>Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level)</td> | |||
<td style="width:100px;">STRING</td></tr> | |||
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%"> | |||
<tr><td colspan=3><b>Parameters for selection "significance level":</b></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">sl</font></td> | |||
<td>Significance level (The significance level for determining the prediction threshold, valid range = [0.0, 0.01], default = 1.0E-4)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr><td colspan=3><b>Parameters for selection "number of sites":</b></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">n</font></td> | |||
<td>Number of sites (The number of expected binding sites for determining the prediction threshold, valid range = [1, 1000000], default = 10000)</td> | |||
<td style="width:100px;">INT</td> | |||
</tr> | |||
</table></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">TALEs</font></td> | |||
<td>TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa)</td> | |||
<td style="width:100px;">FILE</td> | |||
</tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">Strand</font></td> | |||
<td>Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands)</td> | |||
<td style="width:100px;">STRING</td></tr> | |||
<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%"> | |||
<tr><td colspan=3><b>Parameters for selection "both strands":</b></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">r</font></td> | |||
<td>Reverse penalty (Penalty for predictions on the reverse strand, valid range = [0.0, 1.7976931348623157E308], default = 0.01)</td> | |||
<td style="width:100px;">DOUBLE</td> | |||
</tr> | |||
<tr><td colspan=3><b>No parameters for selection "forward strand"</b></td></tr> | |||
<tr><td colspan=3><b>No parameters for selection "reverse strand"</b></td></tr> | |||
</table></td></tr> | |||
<tr style="vertical-align:top"> | |||
<td><font color="green">outdir</font></td> | |||
<td>The output directory, defaults to the current working directory (.)</td> | |||
<td>STRING</td> | |||
</tr> | |||
</table> | |||
'''Example:''' | |||
java -jar DerTALEv2.jar preditale s=<Sequences> TALEs=<TALEs> |
Latest revision as of 22:30, 24 May 2024
DerTALEv2 filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.
As input, DerTALEv2 requires a list of target box predictions as generated by the PrediTALE tool, which is included in the DerTALEv2 JAR file.
For determining differentially expressed regions, DerTALEv2 also needs mapped RNA-seq data after Xanthomonas infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, DerTALEv2 also needs an index file with the same base name as the BAM file but additional extension .bai
(as generated by samtools).
Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed and a threshold on the log (base 2) differential abundance (e.g., 1
for a two-fold induction).
Command line tool
DerTALEv2 is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
DerTALEv2 and PrediTALE are packaged in one runnable JAR that may be run from the command line with
java -jar DerTALEv2
which lists the tools available and usage information
Available tools: dertalev2 - DerTALEv2 preditale - PrediTALE Syntax: java -jar DerTALEv2.jar <toolname> [<parameter=value> ...] Further info about the tools is given with java -jar DerTALEv2.jar <toolname> info For tests of individual tools: java -jar DerTALEv2.jar <toolname> test [<verbose>] Tool parameters are listed with java -jar DerTALEv2.jar <toolname>
You get a list of the tool parameters by calling DerTALEv2.jar with the corresponding tool name, e.g.,
java -jar DerTALEv2.jar dertalev2
The meaning of the individual tool parameters is described below.
Tool parameters
DerTALEv2
DerTALEv2 filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.
If you experience problems using DerTALEv2, please contact us.
DerTALEv2 may be called with
java -jar DerTALEv2.jar dertalev2
and has the following parameters
name | comment | type | |||
p | Predictions (Predictions output file, type = tsv,tabular) | FILE | |||
The following parameter(s) can be used multiple times: | |||||
| |||||
The following parameter(s) can be used multiple times: | |||||
| |||||
n | Number of predictions (Number of (top) predictions considered, default = 100) | INT | |||
r | Region width (Number of bases around the predicted site, default = 500) | INT | |||
Threshold | Threshold (Threshold on the log differential abundance, default = 1.0) | DOUBLE | |||
s | Stranded (Defines whether the reads are stranded. In case of FR_FIRST_STRAND, the first read of a read pair or the only read in case of single-end data is assumed to be located on forward strand of the cDNA, i.e., reverse to the mRNA orientation. If you are using Illumina TruSeq you should use FR_FIRST_STRAND., range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED) | STRING | |||
cc | Coverage cutoff (Minimum amount of reads as coverage cuttoff., default = 10) | INT | |||
rev | Region elongation value (Amount of bases a region is elongated if coverage is above half of coverage cuttoff at start/end of region., default = 100) | INT | |||
m | Minimum length of candidate region (Minimum length of candidate region., default = 100) | INT | |||
mcotcr | Minimum coverage of the Candidate Region (Minimum coverage of the Candidate Region., default = 50) | INT | |||
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar DerTALEv2.jar dertalev2 p=<Predictions> t=<Treatment_BAM1> t=<Treatment_BAM2> c=<Control_BAM1> c=<Control_BAM2>
PrediTALE
PrediTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE.
As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data.
The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.
TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the TALE Analysis tool of AnnoTALE.
Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to 0
in case of genome-wide predictions.
If you experience problems using PrediTALE, please contact us.
PrediTALE may be called with
java -jar DerTALEv2.jar preditale
and has the following parameters
name | comment | type | ||||||||||||
s | Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta) | FILE | ||||||||||||
b | Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample) | STRING | ||||||||||||
| ||||||||||||||
t | Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level) | STRING | ||||||||||||
| ||||||||||||||
TALEs | TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa) | FILE | ||||||||||||
Strand | Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands) | STRING | ||||||||||||
| ||||||||||||||
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar DerTALEv2.jar preditale s=<Sequences> TALEs=<TALEs>