DerTALEv2: Difference between revisions

Latest revision as of 22:30, 24 May 2024

DerTALEv2 filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.

As input, DerTALEv2 requires a list of target box predictions as generated by the PrediTALE tool, which is included in the DerTALEv2 JAR file.

For determining differentially expressed regions, DerTALEv2 also needs mapped RNA-seq data after Xanthomonas infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, DerTALEv2 also needs an index file with the same base name as the BAM file but additional extension .bai (as generated by samtools).

Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed and a threshold on the log (base 2) differential abundance (e.g., 1 for a two-fold induction).

Command line tool

DerTALEv2 is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

DerTALEv2 and PrediTALE are packaged in one runnable JAR that may be run from the command line with

java -jar DerTALEv2

which lists the tools available and usage information

Available tools:

	dertalev2 - DerTALEv2
	preditale - PrediTALE

Syntax: java -jar DerTALEv2.jar <toolname> [<parameter=value> ...]

Further info about the tools is given with
	java -jar DerTALEv2.jar <toolname> info

For tests of individual tools:
	java -jar DerTALEv2.jar <toolname> test [<verbose>]

Tool parameters are listed with
	java -jar DerTALEv2.jar <toolname>

You get a list of the tool parameters by calling DerTALEv2.jar with the corresponding tool name, e.g.,

 java -jar DerTALEv2.jar dertalev2

The meaning of the individual tool parameters is described below.

Tool parameters

DerTALEv2

DerTALEv2 filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.

If you experience problems using DerTALEv2, please contact us.

DerTALEv2 may be called with

java -jar DerTALEv2.jar dertalev2

and has the following parameters

name

comment

type

p

Predictions (Predictions output file, type = tsv,tabular)

FILE

The following parameter(s) can be used multiple times:

t

Treatment BAM (BAM file of mapped reads from treatment experiment. BAM file must have an index with additional extension .bai., type = bam)

FILE

The following parameter(s) can be used multiple times:

c

Control BAM (BAM file of mapped reads from control experiment. BAM file must have an index with additional extension .bai., type = bam)

FILE

n

Number of predictions (Number of (top) predictions considered, default = 100)

INT

r

Region width (Number of bases around the predicted site, default = 500)

INT

Threshold

Threshold (Threshold on the log differential abundance, default = 1.0)

DOUBLE

s

Stranded (Defines whether the reads are stranded. In case of FR_FIRST_STRAND, the first read of a read pair or the only read in case of single-end data is assumed to be located on forward strand of the cDNA, i.e., reverse to the mRNA orientation. If you are using Illumina TruSeq you should use FR_FIRST_STRAND., range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED)

STRING

cc

Coverage cutoff (Minimum amount of reads as coverage cuttoff., default = 10)

INT

rev

Region elongation value (Amount of bases a region is elongated if coverage is above half of coverage cuttoff at start/end of region., default = 100)

INT

m

Minimum length of candidate region (Minimum length of candidate region., default = 100)

INT

mcotcr

Minimum coverage of the Candidate Region (Minimum coverage of the Candidate Region., default = 50)

INT

outdir

The output directory, defaults to the current working directory (.)

STRING

Example:

java -jar DerTALEv2.jar dertalev2 p=<Predictions> t=<Treatment_BAM1> t=<Treatment_BAM2> c=<Control_BAM1> c=<Control_BAM2>

PrediTALE

PrediTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE.

As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the TALE Analysis tool of AnnoTALE. Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to 0 in case of genome-wide predictions.

If you experience problems using PrediTALE, please contact us.

PrediTALE may be called with

java -jar DerTALEv2.jar preditale

and has the following parameters

name

comment

type

s

Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta)

FILE

b

Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample)

STRING

No parameters for selection "sub-sample"
Parameters for selection "background sequences":
bs	Background sequences (The sequences (e.g., a genome) for determining the prediction threshold, type = fa,fas,fasta)	FILE

t

Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level)

STRING

Parameters for selection "significance level":
sl	Significance level (The significance level for determining the prediction threshold, valid range = [0.0, 0.01], default = 1.0E-4)	DOUBLE
Parameters for selection "number of sites":
n	Number of sites (The number of expected binding sites for determining the prediction threshold, valid range = [1, 1000000], default = 10000)	INT

TALEs

TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa)

FILE

Strand

Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands)

STRING

Parameters for selection "both strands":
r	Reverse penalty (Penalty for predictions on the reverse strand, valid range = [0.0, 1.7976931348623157E308], default = 0.01)	DOUBLE
No parameters for selection "forward strand"
No parameters for selection "reverse strand"

outdir

The output directory, defaults to the current working directory (.)

STRING

Example:

java -jar DerTALEv2.jar preditale s=<Sequences> TALEs=<TALEs>

DerTALEv2: Difference between revisions

Latest revision as of 22:30, 24 May 2024

Contents

Command line tool

Tool parameters

DerTALEv2

PrediTALE

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Documentation

Tools

@@ Line 1: / Line 1: @@
 '''DerTALEv2''' filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.
-As input, DerTALEv2 requires a list of target box predictions as generated by the PrediTALE tool, which is included in the DerTALEv2 JAR file.
+As input, ''DerTALEv2'' requires a list of target box predictions as generated by the ''PrediTALE'' tool, which is included in the DerTALEv2 JAR file.
-For determining differentially expressed regions, DerTALEv2 also needs mapped RNA-seq data after ''Xanthomonas'' infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, DerTALEv2 also needs an index file with the same base name as the BAM file but additional extension ".bai" (as generated by samtools).
+For determining differentially expressed regions, ''DerTALEv2'' also needs mapped RNA-seq data after ''Xanthomonas'' infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, ''DerTALEv2'' also needs an index file with the same base name as the BAM file but additional extension <code>.bai</code> (as generated by samtools).
-Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed, and a threshold on the log (base 2) differential abundance (e.g., 1 for a two-fold induction).
+Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed and a threshold on the log (base 2) differential abundance (e.g., <code>1</code> for a two-fold induction).
+== Command line tool ==
+''DerTALEv2 is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.''
+DerTALEv2 and PrediTALE are packaged in one [http://www.jstacs.de/downloads/DerTALEv2.jar runnable JAR] that may be run from the command line with
+ java -jar DerTALEv2
+which lists the tools available and usage information
+ Available tools:
+ 	dertalev2 - DerTALEv2
+ 	preditale - PrediTALE
+ Syntax: java -jar DerTALEv2.jar <toolname> [<parameter=value> ...]
+ Further info about the tools is given with
+ 	java -jar DerTALEv2.jar <toolname> info
+ For tests of individual tools:
+ 	java -jar DerTALEv2.jar <toolname> test [<verbose>]
+ Tool parameters are listed with
+ 	java -jar DerTALEv2.jar <toolname>
+You get a list of the tool parameters by calling DerTALEv2.jar with the corresponding tool name, e.g.,
+  java -jar DerTALEv2.jar dertalev2
+The meaning of the individual tool parameters is described below.
+== Tool parameters ==
+=== DerTALEv2 ===
+'''DerTALEv2''' filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.
+If you experience problems using ''DerTALEv2'', please [mailto:grau@informatik.uni-halle.de contact] us.
+''DerTALEv2'' may be called with
+ java -jar DerTALEv2.jar dertalev2
+and has the following parameters
+<table border=0 cellpadding=10 align="center" width="100%">
+<tr>
+<td>name</td>
+<td>comment</td>
+<td>type</td>
+</tr>
+<tr><td colspan=3><hr></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">p</font></td>
+<td>Predictions (Predictions output file, type = tsv,tabular)</td>
+<td style="width:100px;">FILE</td>
+</tr>
+<tr><td colspan=3>The following parameter(s) can be used multiple times:</td></tr>
+<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
+<tr style="vertical-align:top">
+<td><font color="green">t</font></td>
+<td>Treatment BAM (BAM file of mapped reads from treatment experiment. BAM file must have an index with additional extension .bai., type = bam)</td>
+<td style="width:100px;">FILE</td>
+</tr>
+</table>
+</td></tr>
+<tr><td colspan=3>The following parameter(s) can be used multiple times:</td></tr>
+<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
+<tr style="vertical-align:top">
+<td><font color="green">c</font></td>
+<td>Control BAM (BAM file of mapped reads from control experiment. BAM file must have an index with additional extension .bai., type = bam)</td>
+<td style="width:100px;">FILE</td>
+</tr>
+</table>
+</td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">n</font></td>
+<td>Number of predictions (Number of (top) predictions considered, default = 100)</td>
+<td style="width:100px;">INT</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">r</font></td>
+<td>Region width (Number of bases around the predicted site, default = 500)</td>
+<td style="width:100px;">INT</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">Threshold</font></td>
+<td>Threshold (Threshold on the log differential abundance, default = 1.0)</td>
+<td style="width:100px;">DOUBLE</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">s</font></td>
+<td>Stranded (Defines whether the reads are stranded. In case of FR_FIRST_STRAND, the first read of a read pair or the only read in case of single-end data is assumed to be located on forward strand of the cDNA, i.e., reverse to the mRNA orientation. If you are using Illumina TruSeq you should use FR_FIRST_STRAND., range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED)</td>
+<td style="width:100px;">STRING</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">cc</font></td>
+<td>Coverage cutoff (Minimum amount of reads as coverage cuttoff., default = 10)</td>
+<td style="width:100px;">INT</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">rev</font></td>
+<td>Region elongation value (Amount of bases a region is elongated if coverage is above half of coverage cuttoff at start/end of region., default = 100)</td>
+<td style="width:100px;">INT</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">m</font></td>
+<td>Minimum length of candidate region (Minimum length of candidate region., default = 100)</td>
+<td style="width:100px;">INT</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">mcotcr</font></td>
+<td>Minimum coverage of the Candidate Region (Minimum coverage of the Candidate Region., default = 50)</td>
+<td style="width:100px;">INT</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">outdir</font></td>
+<td>The output directory, defaults to the current working directory (.)</td>
+<td>STRING</td>
+</tr>
+</table>
+'''Example:'''
+ java -jar DerTALEv2.jar dertalev2 p=&lt;Predictions&gt; t=&lt;Treatment_BAM1&gt; t=&lt;Treatment_BAM2&gt; c=&lt;Control_BAM1&gt; c=&lt;Control_BAM2&gt;
+=== PrediTALE ===
+'''PrediTALE''' predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE.
+As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data.
+The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.
+TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the ''TALE Analysis'' tool of AnnoTALE.
+Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to <code>0</code> in case of genome-wide predictions.
+If you experience problems using ''PrediTALE'', please [mailto:grau@informatik.uni-halle.de contact] us.
+''PrediTALE'' may be called with
+ java -jar DerTALEv2.jar preditale
+and has the following parameters
+<table border=0 cellpadding=10 align="center" width="100%">
+<tr>
+<td>name</td>
+<td>comment</td>
+<td>type</td>
+</tr>
+<tr><td colspan=3><hr></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">s</font></td>
+<td>Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta)</td>
+<td style="width:100px;">FILE</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">b</font></td>
+<td>Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample)</td>
+<td style="width:100px;">STRING</td></tr>
+<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
+<tr><td colspan=3><b>No parameters for selection &quot;sub-sample&quot;</b></td></tr>
+<tr><td colspan=3><b>Parameters for selection &quot;background sequences&quot;:</b></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">bs</font></td>
+<td>Background sequences (The sequences (e.g., a genome) for determining the prediction threshold, type = fa,fas,fasta)</td>
+<td style="width:100px;">FILE</td>
+</tr>
+</table></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">t</font></td>
+<td>Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level)</td>
+<td style="width:100px;">STRING</td></tr>
+<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
+<tr><td colspan=3><b>Parameters for selection &quot;significance level&quot;:</b></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">sl</font></td>
+<td>Significance level (The significance level for determining the prediction threshold, valid range = [0.0, 0.01], default = 1.0E-4)</td>
+<td style="width:100px;">DOUBLE</td>
+</tr>
+<tr><td colspan=3><b>Parameters for selection &quot;number of sites&quot;:</b></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">n</font></td>
+<td>Number of sites (The number of expected binding sites for determining the prediction threshold, valid range = [1, 1000000], default = 10000)</td>
+<td style="width:100px;">INT</td>
+</tr>
+</table></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">TALEs</font></td>
+<td>TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa)</td>
+<td style="width:100px;">FILE</td>
+</tr>
+<tr style="vertical-align:top">
+<td><font color="green">Strand</font></td>
+<td>Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands)</td>
+<td style="width:100px;">STRING</td></tr>
+<tr><td></td><td colspan=2><table border=0 cellpadding=0 align="center" width="100%">
+<tr><td colspan=3><b>Parameters for selection &quot;both strands&quot;:</b></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">r</font></td>
+<td>Reverse penalty (Penalty for predictions on the reverse strand, valid range = [0.0, 1.7976931348623157E308], default = 0.01)</td>
+<td style="width:100px;">DOUBLE</td>
+</tr>
+<tr><td colspan=3><b>No parameters for selection &quot;forward strand&quot;</b></td></tr>
+<tr><td colspan=3><b>No parameters for selection &quot;reverse strand&quot;</b></td></tr>
+</table></td></tr>
+<tr style="vertical-align:top">
+<td><font color="green">outdir</font></td>
+<td>The output directory, defaults to the current working directory (.)</td>
+<td>STRING</td>
+</tr>
+</table>
+'''Example:'''
+ java -jar DerTALEv2.jar preditale s=&lt;Sequences&gt; TALEs=&lt;TALEs&gt;