EpiTALE: Difference between revisions
No edit summary |
|||
Line 25: | Line 25: | ||
== Example data == | == Example data == | ||
We provide an archive with example data at [ | We provide an archive with example data at [https://zenodo.org zenodo]. Beside the data, this archive contains the command line version of the EpiTALE suite v0.1 and a bash script demonstrating the complete EpiTALE pipeline. | ||
== Tools == | == Tools == |
Revision as of 22:49, 10 May 2021
EpiTALE predicts binding sites of transcription activator-like effectors (TALEs) in promoteromes or genomes. EpiTALE not only considers the DNA sequence of putative binding sites but also epigenetic determinants of TALE binding, namely DNA methylation and chromatin accessibility. The prediction is based on the same basic model as PrediTALE but with specific parameters for methylated cytosines reflecting the binding preferences of RVDs.
Here, we provide a suite of tools including the EpiTALE program itself but also auxiliary tools for converting methylation data and chromatin accessibility data to the required formats, and for converting genomic coordinates to promoter-wise coordinates for promoterome-wide predictions.
Genome-wide predictions of EpiTALE may further be combined with evidence from RNA-seq data using the DerTALE tool of AnnoTALE.
The EpiTALE suite is provided in a version with a graphical user interface and in a command line version, which may serve the needs of specific user groups, both using the identical code base.
In the following, we describe how to obtain the EpiTALE suite and how to use its individual tools. While parameters are described in terms of command line arguments, the same parameters are available in the version with graphical user interface.
Download
GUI version
- Runnable Jar: requires Java >= 8 including JavaFX installed, may be run under Linux, Windows and macOS.
- macOS app: ZIP archive containing a macOS app including EpiTALE and all required Java modules. For running this app, it might be required to explicitly give it running permissions in "System Preferences" -> "Security & Privacy" -> "General", which should list EpiTALE after the first (possibly unsuccessful) starting attempt.
- Windows program: ZIP archive containing the EpiTALE Jar, all required Java modules, and a Windows batch file. For starting EpiTALE, double-click EpiTALE.bat.
Command line version
- Runnable Jar: requires Java >= 8, may be run under Linux, Windows and macOS. May be started with
java -jar EpiTALEcli-0.1.jar
from the command line (for tools and arguments, see below).
Example data
We provide an archive with example data at zenodo. Beside the data, this archive contains the command line version of the EpiTALE suite v0.1 and a bash script demonstrating the complete EpiTALE pipeline.
Tools
Bed2Bismark
Bed2Bismark converts methylation information in bedMethyl format to Bismark format.
The input of Bed2Bismark is a file in bedMethyl format.
If you experience problems using Bed2Bismark, please contact us.
Bed2Bismark may be called with
java -jar EpiTALEcli-0.1.jar bed2bismark
and has the following parameters
name | comment | type |
b | BedMethyl file (Methylationinformation in bedMethyl format, type = bed.gz,bed) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar bed2bismark b=<BedMethyl_file>
BismarkMerge2Files
BismarkMerge2Files merges files generated by Bismark methylation extractor with parameters –bedGraph –CX -p
.
The output contains a coverage file, which contains the tab-separated columns:
chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated
.
The input of BismarkMerge2Files are two Bismark coverage files.
If you experience problems using BismarkMerge2Files, please contact us.
BismarkMerge2Files may be called with
java -jar EpiTALEcli-0.1.jar bismerger
and has the following parameters
name | comment | type |
b | Bismark file 1 (Methylationinformation in bismark format file 1, type = cov.gz,cov) | FILE |
bf2 | Bismark file 2 (Methylationinformation in bismark format file 2, type = cov.gz,cov) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar bismerger b=<Bismark_file_1> bf2=<Bismark_file_2>
BismarkConvertToPromoter
BismarkConvertToPromoter converts the Bismark output file to promoter coordinates.
The input of BismarkConvertToPromoter is
1. a Bismark coverage output file, which contains tab-separated columns:
chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated
and
2. the promoter sequences in FastA format with headers like:
> id chromosomeName:start-end:strand
e.g.
> Os01g01010.1 Chr1:2602-3102:+
.
If you experience problems using BismarkConvertToPromoter, please contact us.
BismarkConvertToPromoter may be called with
java -jar EpiTALEcli-0.1.jar bis2prom
and has the following parameters
name | comment | type |
b | Bismark file (Methylationinformation in bismark format, type = cov.gz,cov) | FILE |
p | Promoter fasta file (Promoter fastA file, type = fa,fasta) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar bis2prom b=<Bismark_file> p=<Promoter_fasta_file>
Chromatin pileup
Chromatin pileup takes as input a BAM file of mapped reads from an DNase-seq or ATAC-seq experiment
and computes a coverage pileup of 5' ends of mapped reads,
and outputs a simple tab-separated file with columns:
chromosome, position,
and pileup value
(number of reads with a 5' end at this position).
If you experience problems using Chromatin pileup, please contact us.
Chromatin pileup may be called with
java -jar EpiTALEcli-0.1.jar pileup
and has the following parameters
name | comment | type |
b | BAM file (Mapped reads from DNase-seq or ATAC-seq experiment, type = bam) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar pileup b=<BAM_file>
NormalizePileupOutput
NormalizePileupOutput normalizes the pileup output file, that contains the coverage with 5’ ATAC-seq or DNase-seq reads at each position. It normalizes the coverage relative to the mean of a 10000 bp sliding window.
The input of NormalizePileupOutput is a pileup output file from Chromatin pileup tool.
If you experience problems using NormalizePileupOutput, please contact us.
NormalizePileupOutput may be called with
java -jar EpiTALEcli-0.1.jar normpileup
and has the following parameters
name | comment | type |
p | Pileup output file (Pileup output file., type = tsv.gz,tsv,txt) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar normpileup p=<Pileup_output_file>
PileupConvertToPromoter
PileupConvertToPromoter converts the pileup output file to promoter coordinates.
The input of PileupConvertToPromoter is
1. a normalized pileup output file from NormalizePileupOutput tool and
2. the promoter sequences in FastA format with headers like:
> id chromosomeName:start-end:strand
e.g.
> Os01g01010.1 Chr1:2602-3102:+
.
If you experience problems using PileupConvertToPromoter, please contact us.
PileupConvertToPromoter may be called with
java -jar EpiTALEcli-0.1.jar pile2prom
and has the following parameters
name | comment | type |
n | Normalized pileup output file (Normalized pileup output file., type = tsv.gz,tsv) | FILE |
p | Promoter fasta file (Promoter fastA file, type = fa,fasta) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar pile2prom n=<Normalized_pileup_output_file> p=<Promoter_fasta_file>
NarrowPeakConvertToPromoter
NarrowPeakConvertToPromoter converts the narrowPeak containing peaks of chromatin accessibility file to promoter coordinates.
The input of NarrowPeakConvertToPromoter is
1. a narrowPeak file and
2. the promoter sequences in FastA format with headers like:
> id chromosomeName:start-end:strand
e.g.
> Os01g01010.1 Chr1:2602-3102:+
.
If you experience problems using NarrowPeakConvertToPromoter, please contact us.
NarrowPeakConvertToPromoter may be called with
java -jar EpiTALEcli-0.1.jar peak2Prom
and has the following parameters
name | comment | type |
n | NarrowPeak file (Peak-calling output in narrowPeak format., type = narrowPeak,narrowPeak.gz) | FILE |
p | Promoter fasta file (Promoter fastA file, type = fa,fasta) | FILE |
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar peak2Prom n=<NarrowPeak_file> p=<Promoter_fasta_file>
EpiTALE prediction
EpiTALE predicts TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE and optionally considers the methylation state of the target box during prediction, as DNA methylation affects the binding specificity of RVDs. Additionally, EpiTALE optionally annotates chromatin accessibility of predicted target sites using output of the NormalizePileupOutput tool and result of peak-calling of DNase-seq and ATAC-seq data to the predictions of EpiTALE.
As input, EpiTALE requires
1. a set of sequences that are scanned for putative TALE target boxes. These sequences could be promoters of genes but also complete genomic sequences (FastA format).
2. For computing p-values, EpiTALE additionally needs a background set of sequences, which is by default generated as a sub-sample of the original input data.
3. The prediction threshold may be defined either by means of a p-values or an approximate number of expected sites. The latter will also be converted to a p-value, internally, and the defined number of expected sites in not met exactly, in general.
4. TALEs are specified by a FastA file containing their RVD sequences, where individual RVDs are separated by dashes (-). This is the same format also output by the TALE Analysis tool of AnnoTALE.
5. It can be specified if both strands or only one of the strands are scanned where, in the former case, a penalty may be assigned to predictions on the reverse strand. While this penalty may be reasonable when scanning promoters, it should usually be set to 0
in case of genome-wide predictions.
6. As optional input EpiTALE considers methylation during prediction, if Bismark output is provided. With Bismark methylation extractor with parameters –bedGraph –CX -p
you can generate a coverage file, which contains the tab-separated columns:
chromosome, start_position, end_position, methylation_percentage, count_methylated, count_unmethylated
(file.cov.gz).
You can alternatively use the tool Bed2Bismark, which converts data in BedMethyl format to Bismark format.
7. (i) The chromatin accessibility of the input sequences can optionally be provided in narrowPeak format. By mapping ATAC-seq or DNase-seq data to the corresponding genome and then performing peak calling, e.g. with JAMM. In case of promoter sequences as input, you should run the tool NarrowPeakConvertToPromoter to convert the narrowPeak-File to promoter positions. (ii) Additionally, you can calculate a coverage pileup of 5' ends of mapped reads with Chromatin pileup and normalize it with NormalizePileupOutput. In case of promoter sequences as input, you should run the tool PileupConvertToPromoter to convert to promoter coordinates.
8.
(i) In case of genomic search the parameter calculate coverage area should be surround target site
and you can set the number of positions before target site with coverage before value
(default: 300) and the positions after target site coverage after value
(default: 200).
(ii) In case of promoter search the parameter calculate coverage area may set to on complete sequence
or surround target site
. The number of positions before and after binding site in peak profile can be set by Peak before value
(default: 300) and Peak after value
(default: 50).
In case of genomic search you can filter predictions of TALE target boxes by the presence of differentially expressed regions in a defined vicinity around a predicted target box. with the tool DerTALE of AnnoTALE suite.
If you experience problems using EpiTALE, please contact us.
EpiTALE prediction may be called with
java -jar EpiTALEcli-0.1.jar epitale
and has the following parameters
name | comment | type | ||||||||||||
s | Sequences (The sequences (e.g., a genome) to scan for binding sites, type = fa,fas,fasta) | FILE | ||||||||||||
b | Background sample (The sequences for determining the prediction threshold. Either a sub-sample of the input sequences or a dedicated background data set., range={sub-sample, background sequences}, default = sub-sample) | STRING | ||||||||||||
| ||||||||||||||
t | Threshold specification (The way of defining the prediction threshold. Either by explicitly defining a significance level or by specifying the number of expected sites, range={significance level, number of sites}, default = significance level) | STRING | ||||||||||||
| ||||||||||||||
TALEs | TALEs (The RVD sequences of the TALE, separated by dashes, in FastA format, type = fasta,fas,fa) | FILE | ||||||||||||
Strand | Strand (Prediction target sites on both strands, or the forward or reverse strand, range={both strands, forward strand, reverse strand}, default = both strands) | STRING | ||||||||||||
| ||||||||||||||
bf | Bismark file (The bedGraph output of bismark (file.cov.gz) containig <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>, type = cov,cov.gz, OPTIONAL) | FILE | ||||||||||||
nf | NarrowPeak file (The output of a peak caller (all.peaks.narrowPeak), type = narrowPeak,narrowPeak.gz, OPTIONAL) | FILE | ||||||||||||
npo | Normalized pileup output (The normalized output of pileup with values larger than zero (file.txt) containig <chromosome> <position> <coverage>, type = tsv,tsv.gz, OPTIONAL) | FILE | ||||||||||||
c | Calculate coverage area (Calculate coverage area surround target site, or on complete sequence, range={surround target site, on complete sequence}, default = surround target site, OPTIONAL) | STRING | ||||||||||||
| ||||||||||||||
p | Peak before value (Number of positions before target site in peak profile, valid range = [1, 500], default = 300, OPTIONAL) | INT | ||||||||||||
pav | Peak after value (Number of positions after target site in peak profile, valid range = [1, 500], default = 50, OPTIONAL) | INT | ||||||||||||
outdir | The output directory, defaults to the current working directory (.) | STRING |
Example:
java -jar EpiTALEcli-0.1.jar epitale s=<Sequences> TALEs=<TALEs>