TALENoffer
by Jan Grau, Jens Boch, and Stefan Posch.
TALENoffer is a tool for genome-wide prediction of TAL effector nuclease (TALEN) off-target sites. TALENoffer is based on the same statistical model as TALgetter and features a substantially improved runtime, which allows for scanning complete genomes for TALEN off-target sites within a few minutes.
We provide TALENoffer as a public web-server, a web-application that can be installed in a local Galaxy server, and as a command line program.
Paper
If you use TALENoffer, please cite
J. Grau, J. Boch, and S. Posch. TALENoffer: genome-wide TALEN off-target prediction. Bioinformatics, 2013, doi: 10.1093/bioinformatics/btt501.
TALENoffer web-server
TALENoffer is available as a public web-server at galaxy.informatik.uni-halle.de.
Download
TALENoffer is implemented in Java using Jstacs. You can download the command line application as a Jar. In addition, we provide the Jar of the Galaxy web-application for installing it in your local Galaxy server.
TALENoffer will be part of the next public release of the Jstacs library. As (future) part of Jstacs, TALENoffer will be released under GPL 3.
Running the command line application
For running the command line application, Java v1.6 or later is required.
The arguments of the command line application have the following meaning:
name | comment | type |
input | Input sequences (The sequences to scan for TALEN targets, FastA) | String |
annotation | Annotation file (A file containing genomic annotations (e.g., genes, mRNAs, exons) in GFF, GTF, or UCSC known genes BED format, OPTIONAL) | String |
rvdl | First RVD sequence (The sequence of RVDs of the first TALEN monomer, seperated by '-', default = NI-HD-HD-NG-NN-NK-NK) | String |
rvdr | Second RVD sequence (The sequence of RVDs of the second TALEN monomer, seperated by '-', default = NI-HD-HD-NG-NN-NK-NK) | String |
nterml | N-Terminal first (For the first RVD sequence, consider the architecture, where the endonuclease domain is used to the N-terminus instead of the standard C-terminal architecture, default = false) | Boolean |
ntermr | N-Terminal second (For the second RVD sequence, consider the architecture, where the endonuclease domain is used to the N-terminus instead of the standard C-terminal architecture, default = false) | Boolean |
heterodimers | Hetero-dimers only (Consider only hetero-dimers of TALEN monomers instead of the standard search for TALEN hetero and homo-dimers, default = false) | Boolean |
min | Minimum distance (Minimum distance between TALEN monomer target sites, valid range = [0, 100], default = 12) | Integer |
max | Maximum distance (Maximum distance between TALEN monomer target sites, valid range = [0, 100], default = 24) | Integer |
model | Model type (TALgetter is the default model that uses individual binding specificities for each RVD. TALgetter13 uses binding specificities that only depend on amino acid 13, i.e., the second amino acid of the repat.While TALgetter is recommended in most cases, the use of TALgetter13 may be beneficial if you search for target sites of TAL effector with many rare RVDs, for instance YG, HH, or S*., range={TALgetter, TALgetter13}, default = TALgetter) | String |
addrvds | RVD specificities (File defining additional or overriding existing RVD specificities, Example file setting specificity of position 0 to T, defining new specificities for NG and HG , and introducing a new RVD XX , OPTIONAL) |
String |
filter | Filter (Filter off-targets using different thresholds on the score relative to the best- matching site. Typical values are Loose (q=0.35), Medium-Loose (q=0.375), Medium (q=0.4), Medium-Strict (q=0.45), Strict (q=0.5), valid range = [0.35, 1.0], default = 0.4) | String |
top | Maximum number of targets (Limits the total number of reported targets in all input sequences, ranked by their score, valid range = [1, 100000], default = 100) | Integer |
out | Additional output (Path to a GFF3/GFF2 file to which predictions are written in addition to the default output, extension defines format (.gff3/.gff), OPTIONAL) | String |
numThreads | Number of threads (Number of threads used by TALoffer. More than 3 threads typically do not lead to an additional speed-up., valid range = [1, 8], default = 3) | Integer |
For instance, for scanning the FastA-file path/to/myGenome.fa
for the top 100 off-target sites of the two TALENs with RVD sequences NN-HD-HD-NI-NN-NG-NN-NG-HD-HD-NG-HD-NI-HD-NG and NN-NG-HD-HD-NG-HD-HD-NI-HD-NI-NI-NN-HD-HD-NG with a distance between monomer target sites of 10 to 20 bp, you start TALENoffer with
java -jar TALENoffer.jar input=path/to/myGenome.fa rvdl="NN-HD-HD-NI-NN-NG-NN-NG-HD-HD-NG-HD-NI-HD-NG" \
rvdr="NN-NG-HD-HD-NG-HD-HD-NI-HD-NI-NI-NN-HD-HD-NG" min=10 max=20 top=100
If you analyze large data sets, for instance complete mammalian genomes, TALENoffer may require a larger amount of memory than is the default in Java. You can increase the memory available to TALENoffer by additional parameters to the Java virtual machine. If you want to start TALENoffer with 512 MB of memory initially, which may be increased to at most 2 GB during the TALENoffer execution, you call
java -Xms512M -Xmx2G -jar TALENoffer.jar input=path/to/myGenome.fa rvdl="NN-HD-HD-NI-NN-NG-NN-NG-HD-HD-NG-HD-NI-HD-NG" \
rvdr="NN-NG-HD-HD-NG-HD-HD-NI-HD-NI-NI-NN-HD-HD-NG" min=10 max=20 top=100
Installing the web-application
The command-line program behind the web-application is a Jar as well, so Java is required on the server running Galaxy.
To install this command line program in Galaxy, copy it to the desired destination in the Galaxy tools
directory.
The command line application writes its Galaxy tool definition file itself. If you are in the directory containing the command-line program for Galaxy, you can create the tool definition file by calling
java -jar TALENofferWeb.jar --create TALENofferWeb.xml
Afterwards, this directory contains the tool definition file TALENofferWeb.xml
. Now you can register TALENoffer in the Galaxy tool_conf.xml
file. For details, see the Galaxy tutorial for adding new tools.