EpiTALE

From Jstacs
Jump to navigationJump to search
EpiTALE 256.png

EpiTALE predicts binding sites of transcription activator-like effectors (TALEs) in promoteromes or genomes. EpiTALE not only considers the DNA sequence of putative binding sites but also epigenetic determinants of TALE binding, namely DNA methylation and chromatin accessibility. The prediction is based on the same basic model as PrediTALE but with specific parameters for methylated cytosines reflecting the binding preferences of RVDs.

Here, we provide a suite of tools including the EpiTALE program itself but also auxiliary tools for converting methylation data and chromatin accessibility data to the required formats, and for converting genomic coordinates to promoter-wise coordinates for promoterome-wide predictions.

Genome-wide predictions of EpiTALE may further be combined with evidence from RNA-seq data using the DerTALE tool of AnnoTALE.

The EpiTALE suite is provided in a version with a graphical user interface and in a command line version, which may serve the needs of specific user groups, both using the identical code base.

In the following, we describe how to obtain the EpiTALE suite and how to use its individual tools. While parameters are described in terms of command line arguments, the same parameters are available in the version with graphical user interface.

Download

GUI version

  • Runnable Jar: requires Java >= 8 including JavaFX installed, may be run under Linux, Windows and macOS.
  • macOS app: ZIP archive containing a macOS app including EpiTALE and all required Java modules. For running this app, it might be required to explicitly give it running permissions in "System Preferences" -> "Security & Privacy" -> "General", which should list EpiTALE after the first (possibly unsuccessful) starting attempt.
  • Windows program: ZIP archive containing the EpiTALE Jar, all required Java modules, and a Windows batch file. For starting EpiTALE, double-click EpiTALE.bat.

Command line version

  • Runnable Jar: requires Java >= 8, may be run under Linux, Windows and macOS. May be started with
java -jar EpiTALEcli-0.1.jar

from the command line (for tools and arguments, see below).

Tools

Bed2Bismark

Bed2Bismark converts methylation information in bedMethyl format to Bismark format.

The input of Bed2Bismark is a file in bedMethyl format.

If you experience problems using Bed2Bismark, please contact us.


Bed2Bismark may be called with

java -jar EpiTALEcli-0.1.jar bed2bismark

and has the following parameters

name comment type

b BedMethyl file (Methylationinformation in bedMethyl format, type = bed.gz,bed) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar bed2bismark b=<BedMethyl_file>


BismarkMerge2Files

BismarkMerge2Files merges files generated by Bismark methylation extractor with parameters –bedGraph –CX -p. The output contains a coverage file, which contains the tab-separated columns: chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated.

The input of BismarkMerge2Files are two Bismark coverage files.

If you experience problems using BismarkMerge2Files, please contact us.



BismarkMerge2Files may be called with

java -jar EpiTALEcli-0.1.jar bismerger

and has the following parameters

name comment type

b Bismark file 1 (Methylationinformation in bismark format file 1, type = cov.gz,cov) FILE
bf2 Bismark file 2 (Methylationinformation in bismark format file 2, type = cov.gz,cov) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar bismerger b=<Bismark_file_1> bf2=<Bismark_file_2>


BismarkConvertToPromoter

BismarkConvertToPromoter converts the Bismark output file to promoter coordinates.

The input of BismarkConvertToPromoter is 1. a Bismark coverage output file, which contains tab-separated columns: chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated and 2. the promoter sequences in FastA format with headers like: > id chromosomeName:start-end:strand e.g. > Os01g01010.1 Chr1:2602-3102:+.

If you experience problems using BismarkConvertToPromoter, please contact us.


BismarkConvertToPromoter may be called with

java -jar EpiTALEcli-0.1.jar bis2prom

and has the following parameters

name comment type

b Bismark file (Methylationinformation in bismark format, type = cov.gz,cov) FILE
p Promoter fasta file (Promoter fastA file, type = fa,fasta) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar bis2prom b=<Bismark_file> p=<Promoter_fasta_file>


Chromatin pileup

Chromatin pileup takes as input a BAM file of mapped reads from an DNase-seq or ATAC-seq experiment and computes a coverage pileup of 5' ends of mapped reads, and outputs a simple tab-separated file with columns: chromosome, position, and pileup value (number of reads with a 5' end at this position).

If you experience problems using Chromatin pileup, please contact us.


Chromatin pileup may be called with

java -jar EpiTALEcli-0.1.jar pileup

and has the following parameters

name comment type

b BAM file (Mapped reads from DNase-seq or ATAC-seq experiment, type = bam) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar pileup b=<BAM_file>


NormalizePileupOutput

NormalizePileupOutput normalizes the pileup output file, that contains the coverage with 5’ ATAC-seq or DNase-seq reads at each position. It normalizes the coverage relative to the mean of a 10000 bp sliding window.

The input of NormalizePileupOutput is a pileup output file from Chromatin pileup tool.

If you experience problems using NormalizePileupOutput, please contact us.


NormalizePileupOutput may be called with

java -jar EpiTALEcli-0.1.jar normpileup

and has the following parameters

name comment type

p Pileup output file (Pileup output file., type = tsv.gz,tsv,txt) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar normpileup p=<Pileup_output_file>


PileupConvertToPromoter

PileupConvertToPromoter converts the pileup output file to promoter coordinates.

The input of PileupConvertToPromoter is 1. a normalized pileup output file from NormalizePileupOutput tool and 2. the promoter sequences in FastA format with headers like: > id chromosomeName:start-end:strand e.g. > Os01g01010.1 Chr1:2602-3102:+.

If you experience problems using PileupConvertToPromoter, please contact us.


PileupConvertToPromoter may be called with

java -jar EpiTALEcli-0.1.jar pile2prom

and has the following parameters

name comment type

n Normalized pileup output file (Normalized pileup output file., type = tsv.gz,tsv) FILE
p Promoter fasta file (Promoter fastA file, type = fa,fasta) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar pile2prom n=<Normalized_pileup_output_file> p=<Promoter_fasta_file>


NarrowPeakConvertToPromoter

NarrowPeakConvertToPromoter converts the narrowPeak containing peaks of chromatin accessibility file to promoter coordinates.

The input of NarrowPeakConvertToPromoter is 1. a narrowPeak file and 2. the promoter sequences in FastA format with headers like: > id chromosomeName:start-end:strand e.g. > Os01g01010.1 Chr1:2602-3102:+.

If you experience problems using NarrowPeakConvertToPromoter, please contact us.


NarrowPeakConvertToPromoter may be called with

java -jar EpiTALEcli-0.1.jar peak2Prom

and has the following parameters

name comment type

n NarrowPeak file (Peak-calling output in narrowPeak format., type = narrowPeak,narrowPeak.gz) FILE
p Promoter fasta file (Promoter fastA file, type = fa,fasta) FILE
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar peak2Prom n=<NarrowPeak_file> p=<Promoter_fasta_file>


EpiTALE prediction

EpiTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE and optionally considers the methylation state of the target box during prediction, as DNA methylation affects the binding specificity of RVDs. Additionally, EpiTALE optionally annotates chromatin accessibility of predicted target sites using output of the NormalizePileupOutput tool and result of peak-calling of DNase-seq and ATAC-seq data to the predictions of EpiTALE.

As input, EpiTALE requires

1. a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format).

2. For computing p-values, EpiTALE additionally needs a background set of sequences, which is by default generated as a sub-sample of the original input data.

3. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.

4. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the TALE Analysis tool of AnnoTALE.

5. It can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to 0 in case of genome-wide predictions.

6. As optional input EpiTALE considers methylation during prediction, if Bismark output is provided. With Bismark methylation extractor with parameters –bedGraph –CX -p you can generate a coverage file, which contains the tab-separated columns: chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated (file.cov.gz). You can alternatively use the tool Bed2Bismark, which converts data in BedMethyl format to Bismark format.

7. (i) The chromatin accessibility of the input sequences can optionally be provided in narrowPeak format. By mapping ATAC-seq or DNase-seq data to the corresponding genome and then performing peak calling, e.g. with JAMM. In case of promoter sequences as input, you should run the tool NarrowPeakConvertToPromoter to convert the narrowPeak-File to promoter positions. (ii) Additionally, you can calculate a coverage pileup of 5' ends of mapped reads with Chromatin pileup and normalize it with NormalizePileupOutput. In case of promoter sequences as input, you should run the tool PileupConvertToPromoter to convert to promoter coordinates.

8. (i) In case of genomic search the parameter calculate coverage area should be surround target site and you can set the number of positions before target site with coverage before value (default: 300) and the positions after target site coverage after value (default: 200). (ii) In case of promoter search the parameter calculate coverage area may set to on complete sequence or surround target site. The number of positions before and after binding site in peak profile can be set by Peak before value (default: 300) and Peak after value (default: 50).

In case of genomic search you can filter predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box. with the tool DerTALE of AnnoTALE suite.

If you experience problems using EpiTALE, please contact us.




EpiTALE prediction may be called with

java -jar EpiTALEcli-0.1.jar epitale

and has the following parameters

name comment type

s Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta) FILE
b Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample) STRING
No parameters for selection "sub-sample"
Parameters for selection "background sequences":
bs Background sequences (The sequences (e.g., a genome) for determining the prediction threshold, type = fa,fas,fasta) FILE
t Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level) STRING
Parameters for selection "significance level":
sl Significance level (The significance level for determining the prediction threshold, valid range = [0.0, 0.01], default = 1.0E-4) DOUBLE
Parameters for selection "number of sites":
n Number of sites (The number of expected binding sites for determining the prediction threshold, valid range = [1, 1000000], default = 10000) INT
TALEs TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa) FILE
Strand Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands) STRING
Parameters for selection "both strands":
r Reverse penalty (Penalty for predictions on the reverse strand, valid range = [0.0, 1.7976931348623157E308], default = 0.01) DOUBLE
No parameters for selection "forward strand"
No parameters for selection "reverse strand"
bf Bismark file (The bedGraph output of bismark (file.cov.gz) containig <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>, type = cov,cov.gz, OPTIONAL) FILE
nf NarrowPeak file (The output of a peak caller (all.peaks.narrowPeak), type = narrowPeak,narrowPeak.gz, OPTIONAL) FILE
npo Normalized pileup output (The normalized output of pileup with values larger than zero (file.txt) containig <chromosome> <position> <coverage>, type = tsv,tsv.gz, OPTIONAL) FILE
c Calculate coverage area (Calculate coverage area surround target site, or on complete sequence, range={surround target site, on complete sequence}, default = surround target site, OPTIONAL) STRING
Parameters for selection "surround target site":
cbv Coverage before value (Number of positions before target site in coverage profile, valid range = [1, 500], default = 300, OPTIONAL) INT
cav Coverage after value (Number of positions after target site in coverage profile, valid range = [1, 500], default = 200, OPTIONAL) INT
No parameters for selection "on complete sequence"
p Peak before value (Number of positions before target site in peak profile, valid range = [1, 500], default = 300, OPTIONAL) INT
pav Peak after value (Number of positions after target site in peak profile, valid range = [1, 500], default = 50, OPTIONAL) INT
outdir The output directory, defaults to the current working directory (.) STRING

Example:

java -jar EpiTALEcli-0.1.jar epitale s=<Sequences> TALEs=<TALEs>