PrediTALE

From Jstacs
Revision as of 11:18, 16 January 2019 by Grau (talk | contribs) (Created page with "'''PrediTALE''' predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE. A pre-print describing the method behind PrediT...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

PrediTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE. A pre-print describing the method behind PrediTALE and comparing its performance to other tools for TALE target prediction is available from biorxiv (doi:). In addition to PrediTALE, we also provide DerTALE, a tool for filtering genome-wide target site predictions by mapped RNA-seq data after Xanthomonas infection. Both tools are described in more detail below.

PrediTALE and DerTALE are available as a command line application, but have also been integrated into AnnoTALE, which is available with a graphical user interface.


Download

PrediTALE is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

PrediTALE and DerTALE are packaged in one runnable JAR that may be run from the command line with

java -jar PrediTALE.jar

which lists the tools available and usage information

Available tools:

	preditale - PrediTALE
	dertale - DerTALE

Syntax: java -jar PrediTALE.jar <toolname> [<parameter=value> ...]

Further info about the tools is given with
	java -jar PrediTALE.jar <toolname> info

Tool parameters are listed with
	java -jar PrediTALE.jar <toolname>


Source code

PrediTALE

As input, PrediTALE requires a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format). For computing p-values, PrediTALE additional needs a background set of sequences, which is by default generated as a sub-sample of the original input data. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the *TALE Analysis* tool of AnnoTALE. Finally, it can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to ``0`` in case of genome-wide predictions.