Dispom: Difference between revisions

Revision as of 11:53, 2 September 2010

by Jens Keilwagen, Jan Grau, Ivan A. Paponov, Stefan Posch, Marc Strickert and Ivo Grosse.

Description

Background

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet.

Results

We present a de-novo motif discovery tool for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Based on the evaluation of 18 various benchmark data sets we find that the prediction performance of this tool is superior to existing tools for de-novo motif discovery. Finally, we apply the tool to discover binding sites enriched in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as an elongated auxin-responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find that the refined motif increases the auxin specificity by more than three orders of magnitude in genome-wide predictions compared to the canonical auxin-responsive element.

Conclusions

We find that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application.

Paper

The paper De-novo discovery of differentially abundant transcription factor binding sites including their positional preference has been submitted to PLoS Computational Biology.

Download

Dispom can be downloaded here.
The benchmark data sets with implanted binding sites from Jaspar database can be downloaded here.
The auxin data sets can be downloaded here.
The position frequency matrices (PFMs) of the predictions on the metazoan compendium can be downloaded here.

Start instructions

Once you have unzipped the archive, you can start Dispom e.g. by invoking

java -cp .:jstacs-1.3.1.jar:lib/numericalMethods.jar:lib/bytecode.jar:lib/biojava-live.jar projects.dispom.Dispom home=path/to/data/directory/ fg=fgfile.txt bg=bgfile.txt init=best-random=100 p-val=1E-4

to search for motifs that are over-represented in path/to/data/directory/fgfile.txt but not in path/to/data/directory/bgfile.txt, initialize Dispom with the best from 100 randomly drawn starting values, and search for motif occurrences with a p-value less than 1E-4.

Under Windows, you must use ";" instead of ":" in the class path.

The arguments have the following meaning

name	comment	type

home	the path to the data directory, default = ./	String
ignore	the char that is used to mask comment lines in data files, e.g., '>' in a FASTA-file, default = >	Character
fg	the file name of the foreground data file (the file containing sequences which are expected to contain binding sites of a common motif)	String
bg	the file name of the background data file, OPTIONAL	String
position	a switch whether to use uniform, skew-normal, or mixture position distribution, range={UNIFORM, SKEW_NORMAL, MIXTURE}, default = MIXTURE	String
mean	the mean of the a priori TFBS distribution, default = 250.0	Double
sd	the sd of the a priori TFBS distribution, valid range = [1.0, Infinity], default = 150.0	Double
motifs	the number of motifs to be searched for, valid range = [1, 5], default = 1	Integer
length	the motif length that is used at the beginning, valid range = [1, 50], default = 15	Integer
flankOrder	The Markov order of the model for the flanking sequence and the background sequence, valid range = [0, 5], default = 0	Integer
motifOrder	The Markov order of the motif model, valid range = [0, 3], default = 0	Integer
bothStrands	a switch whether to use both strands or not, default = true	Boolean
init	the method that is used for initialization, one of 'best-random=<number>', 'best-random-plugin=<number>', 'best-random-motif=<number>', 'enum-all=<length>', 'enum-data=<length>', 'heuristic=<number>', and 'specific=<sequence or file of sequences>'	String=[Integer \| String]
adjust	a switch whether to adjust the motif length, i.e., either to shrink or expand, default = true	Boolean
maxPos	a switch whether to use max. pos. in the heuristic or not, default = true	Boolean
learning	a switch for the learning principle, range={ML, MAP, MCL, MSP}, default = MSP	String
threads	the number of threads that are use to evaluate the objective function and its gradient, valid range = [1, 128], default = 4	Integer
starts	the number of independent starts of Dispom, valid range = [1, 100], default = 1	Integer
xml	the file name of the xml file the classifier is written to, default = ./classifier.xml	String
p-val	a p-value for predicting binding sites, valid range = [0.0, 1.0], OPTIONAL	Double

Case studies

In case studies presented in the paper, we started Dispom 50 times.

47 times, we used init=best-random-plugin=100.
Once, we used init=heuristic=100.
Once, we used init=enum-data=6.
Once, we used init=enum-data=8.

For predicting binding sites, we used p-val=1E-4.

@@ Line 19: / Line 19: @@
 * The benchmark data sets with implanted binding sites from [http://jaspar.cgb.ki.se/ Jaspar database] can be downloaded [http://www.jstacs.de/downloads/benchmark.zip here].
 * The auxin data sets can be downloaded [http://www.jstacs.de/downloads/auxin.zip here].
-* The position frequency matrices (PFMs) of the predicitions on the [http://acgt.cs.tau.ac.il/amadeus/suppl/results_metazoan.html metazoan compendium] can be downloaded [http://www.jstacs.de/downloads/meatzoan-pfms-1E-4.txt here].
+* The position frequency matrices (PFMs) of the predictions on the [http://acgt.cs.tau.ac.il/amadeus/suppl/results_metazoan.html metazoan compendium] can be downloaded [http://www.jstacs.de/downloads/meatzoan-pfms-1E-4.txt here].
 == Start instructions ==

Dispom: Difference between revisions

Revision as of 11:53, 2 September 2010

Contents

Description

Background

Results

Conclusions

Paper

Download

Start instructions

Case studies

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Documentation

Tools