DerTALEv2: Difference between revisions

From Jstacs
Jump to navigationJump to search
Line 38: Line 38:
The meaning of the individual tool parameters is described below.
The meaning of the individual tool parameters is described below.


== Tools ==
== Tool parameters ==


=== DerTALEv2 ===
=== DerTALEv2 ===

Revision as of 00:29, 25 May 2024

DerTALEv2 filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.

As input, DerTALEv2 requires a list of target box predictions as generated by the PrediTALE tool, which is included in the DerTALEv2 JAR file.

For determining differentially expressed regions, DerTALEv2 also needs mapped RNA-seq data after Xanthomonas infection (treatment) and control in BAM format, which is the standard output format of most mappers, and may be generated from the SAM format using samtools. For each BAM file, DerTALEv2 also needs an index file with the same base name as the BAM file but additional extension .bai (as generated by samtools).

Further parameters that can be specified include the number of predictions in the list that are considered (counting from top), the width of the region in which differential expression is considered, the width of the window that needs to be differentially expressed and a threshold on the log (base 2) differential abundance (e.g., 1 for a two-fold induction).

Command line tool

DerTALEv2 is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

DerTALEv2 and PrediTALE are packaged in one runnable JAR that may be run from the command line with

java -jar DerTALEv2

which lists the tools available and usage information

Available tools:

	dertalev2 - DerTALEv2
	preditale - PrediTALE

Syntax: java -jar DerTALEv2.jar <toolname> [<parameter=value> ...]

Further info about the tools is given with
	java -jar DerTALEv2.jar <toolname> info

For tests of individual tools:
	java -jar DerTALEv2.jar <toolname> test [<verbose>]

Tool parameters are listed with
	java -jar DerTALEv2.jar <toolname>

You get a list of the tool parameters by calling DerTALEv2.jar with the corresponding tool name, e.g.,

 java -jar DerTALEv2.jar dertalev2

The meaning of the individual tool parameters is described below.

Tool parameters

DerTALEv2

DerTALEv2 filters predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box.

If you experience problems using DerTALEv2, please contact us.


DerTALEv2 may be called with

java -jar DerTALEv2.jar dertalev2

and has the following parameters

name comment type

p Predictions (Predictions output file, type = tsv,tabular) FILE
The following parameter(s) can be used multiple times:
t Treatment BAM (BAM file of mapped reads from treatment experiment. BAM file must have an index with additional extension .bai., type = bam) FILE
The following parameter(s) can be used multiple times:
c Control BAM (BAM file of mapped reads from control experiment. BAM file must have an index with additional extension .bai., type = bam) FILE
n Number of predictions (Number of (top) predictions considered, default = 100) INT
r Region width (Number of bases around the predicted site, default = 500) INT
Threshold Threshold (Threshold on the log differential abundance, default = 1.0) DOUBLE
s Stranded (Defines whether the reads are stranded. In case of FR_FIRST_STRAND, the first read of a read pair or the only read in case of single-end data is assumed to be located on forward strand of the cDNA, i.e., reverse to the mRNA orientation. If you are using Illumina TruSeq you should use FR_FIRST_STRAND., range={FR_UNSTRANDED, FR_FIRST_STRAND, FR_SECOND_STRAND}, default = FR_UNSTRANDED) STRING
cc Coverage cutoff (Minimum amount of reads as coverage cuttoff., default = 10) INT
rev Region elongation value (Amount of bases a region is elongated if coverage is above half of coverage cuttoff at start/end of region., default = 100) INT
m Minimum length of candidate region (Minimum length of candidate region., default = 100) INT
mcotcr Minimum coverage of the Candidate Region (Minimum coverage of the Candidate Region., default = 50) INT
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar DerTALEv2.jar dertalev2 p=<Predictions> t=<Treatment_BAM> c=<Control_BAM>


PrediTALE

PrediTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE.

As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the TALE Analysis tool of AnnoTALE. Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to 0 in case of genome-wide predictions.


If you experience problems using PrediTALE, please contact us.


PrediTALE may be called with

java -jar DerTALEv2.jar preditale

and has the following parameters

name comment type

s Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta) FILE
b Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample) STRING
No parameters for selection "sub-sample"
Parameters for selection "background sequences":
bs Background sequences (The sequences (e.g., a genome) for determining the prediction threshold, type = fa,fas,fasta) FILE
t Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level) STRING
Parameters for selection "significance level":
sl Significance level (The significance level for determining the prediction threshold, valid range = [0.0, 0.01], default = 1.0E-4) DOUBLE
Parameters for selection "number of sites":
n Number of sites (The number of expected binding sites for determining the prediction threshold, valid range = [1, 1000000], default = 10000) INT
TALEs TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa) FILE
Strand Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands) STRING
Parameters for selection "both strands":
r Reverse penalty (Penalty for predictions on the reverse strand, valid range = [0.0, 1.7976931348623157E308], default = 0.01) DOUBLE
No parameters for selection "forward strand"
No parameters for selection "reverse strand"
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar DerTALEv2.jar preditale s=<Sequences> TALEs=<TALEs>