Dream5: Difference between revisions
(New page: __NOTOC__ by Jan Grau, Stefan Posch, Ivo Grosse, and Jens Keilwagen. == Description == === Motivation === Protein binding microarrays (PBMs) are valuable for elucidating the binding affin...) |
mNo edit summary |
||
(9 intermediate revisions by 2 users not shown) | |||
Line 7: | Line 7: | ||
=== Results === | === Results === | ||
We implement this approach as extension of Dispom and evaluate it on the benchmark data of Dream5 challenge 2, a rigorous benchmark for the analysis of PBM data. We find an improved overall performance compared to the participants of the challenge. Besides discriminative learning of model parameters, one reason for the superior performance of Dispom is the consideration of dependencies between adjacent positions of the binding sites, which suggests exploiting dependencies between | We implement this approach as extension of [[Dispom]] and evaluate it on the benchmark data of [http://wiki.c2b2.columbia.edu/dream/index.php/D5c2 Dream5 challenge 2], a rigorous benchmark for the analysis of PBM data. We find an improved overall performance compared to the [http://wiki.c2b2.columbia.edu/dream/results/DREAM5/?c=2_1 participants of the challenge]. Besides discriminative learning of model parameters, one reason for the superior performance of Dispom is the consideration of dependencies between adjacent positions of the binding sites, which suggests exploiting dependencies between adjacent base pairs in transcription factor binding sites whenever enough training data are available. Another important property is that this novel approach is robust to typical artifacts of PBM experiments, which facilitates its application to PBM data without the need for prior normalization. | ||
== Paper == | == Paper == | ||
The paper '''''Accurate prediction of protein binding microarray data by discriminative de-novo motif discovery''''' has been submitted to ISMB 2011. | The paper '''''Accurate prediction of protein binding microarray data by discriminative de-novo motif discovery''''' has been submitted to ISMB 2011. | ||
== | == Binaries == | ||
* [http://www.jstacs.de/downloads/Dream5.zip Binaries for training and prediction] | We provide two binaries, one for training a model from PBM data including sequences and associated intensities and one for predicting intensities for given probe sequences using a trained model. By this means, training can be accomplished on e.g. a computation server, while the computationally less expensive predictions can be made on an ordinary workstation. | ||
* [http://www.jstacs.de/downloads/ | |||
* [http://www.jstacs.de/downloads/Dream5.zip Binaries for training and prediction] as ZIP-archive | |||
* [http://www.jstacs.de/downloads/Dream5-sources.zip Sources of the binaries], require Jstacs 1.4 sources to build | |||
=== Training === | === Training === | ||
Trains a model from PBM data in Dream5-format (see [http://wiki.c2b2.columbia.edu/dream/index.php/D5c2 Dream5 homepage]) and stores the trained model to a file. | |||
* Run by calling <code>java -jar Dream5.jar</code> | * Run by calling <code>java -jar Dream5.jar</code> | ||
* Arguments: | * Arguments: | ||
home ... home directory (The home directory where the data reside., default = .) | home ... home directory (The home directory where the data reside., default = .) | ||
file ... input file (The input file in Dream5 format (column 1: TF, column 2: array type, column 3: sequences, column 4: signal, last column: flag), one file per TF and array type, path relative to home directory.) | file ... input file (The input file in Dream5 format (column 1: TF, column 2: array type, column 3: sequences, column 4: signal, last column: flag), one file per TF and array type, path relative to home directory.) | ||
mo ... motif order (The order of the inhomogeneous Markov model for the motif., default = 1) | mo ... motif order (The order of the inhomogeneous Markov model for the motif., default = 1) | ||
fo ... flanking oder (The order of the homogeneous Markov model for flanking sequence and background., default = 3) | fo ... flanking oder (The order of the homogeneous Markov model for flanking sequence and background., default = 3) | ||
starts ... starts (The number of starts of the optimization., default = 5) | starts ... starts (The number of starts of the optimization., default = 5) | ||
threads ... threads (The number of threads, i.e. cores, that are used for optimization., default = 1) | threads ... threads (The number of threads, i.e. cores, that are used for optimization., default = 1) | ||
q ... q (A-priori fraction of data points with weight greater than 0.5, default = 0.025) | q ... q (A-priori fraction of data points with weight greater than 0.5, default = 0.025) | ||
model ... model (File where the trained model is stored as XML, path relative to home directory., default = model.xml) | model ... model (File where the trained model is stored as XML, path relative to home directory., default = model.xml) | ||
* Example: <code>java -jar Dream5.jar file=TF1_HK.txt starts=1 threads=2 model=mymodel.xml</code> | * Example: <code>java -jar Dream5.jar file=TF1_HK.txt starts=1 threads=2 model=mymodel.xml</code> | ||
=== Predictions === | === Predictions === | ||
Loads a trained model from a file and predicts intensities for a given set of probe sequences in Dream5 submission format (see [http://wiki.c2b2.columbia.edu/dream/index.php/D5c2#Submission Dream5 homepage]). | |||
* Run by calling <code>java -jar Dream5Predict.jar</code> | * Run by calling <code>java -jar Dream5Predict.jar</code> | ||
* Arguments: | * Arguments: | ||
home ... home directory (The home directory where the data reside., default = .) | home ... home directory (The home directory where the data reside., default = .) | ||
file ... input file (The input file in Dream5 format for predictions (column 1: TF, column 2: array type, column 3: sequence), one file per TF and array type, path relative to home directory.) | file ... input file (The input file in Dream5 format for predictions (column 1: TF, column 2: array type, column 3: sequence), one file per TF and array type, path relative to home directory.) | ||
model ... model (File with the trained model stored as XML, path relative to home directory., default = model.xml) | model ... model (File with the trained model stored as XML, path relative to home directory., default = model.xml) | ||
outfile ... outfile (File where the predictions are stored., default = predictions.txt) | outfile ... outfile (File where the predictions are stored., default = predictions.txt) | ||
* Example: <code>java -jar | * Example: <code>java -jar Dream5Predict.jar file=TF1_ME.txt model=mymodel.xml outfile=predicitions_TF1_ME.txt</code> |
Latest revision as of 08:03, 9 February 2011
by Jan Grau, Stefan Posch, Ivo Grosse, and Jens Keilwagen.
Description
Motivation
Protein binding microarrays (PBMs) are valuable for elucidating the binding affinity of transcription factors to short DNA sequence in vitro. However, learning accurate models of transcription factor binding from these data is still a challenging problem of bioinformatics. Here, we present a novel approach for analyzing PBM data based on a combination of discriminative learning of a ZOOPS model and an appropriate soft-labeling of the probe sequences.
Results
We implement this approach as extension of Dispom and evaluate it on the benchmark data of Dream5 challenge 2, a rigorous benchmark for the analysis of PBM data. We find an improved overall performance compared to the participants of the challenge. Besides discriminative learning of model parameters, one reason for the superior performance of Dispom is the consideration of dependencies between adjacent positions of the binding sites, which suggests exploiting dependencies between adjacent base pairs in transcription factor binding sites whenever enough training data are available. Another important property is that this novel approach is robust to typical artifacts of PBM experiments, which facilitates its application to PBM data without the need for prior normalization.
Paper
The paper Accurate prediction of protein binding microarray data by discriminative de-novo motif discovery has been submitted to ISMB 2011.
Binaries
We provide two binaries, one for training a model from PBM data including sequences and associated intensities and one for predicting intensities for given probe sequences using a trained model. By this means, training can be accomplished on e.g. a computation server, while the computationally less expensive predictions can be made on an ordinary workstation.
- Binaries for training and prediction as ZIP-archive
- Sources of the binaries, require Jstacs 1.4 sources to build
Training
Trains a model from PBM data in Dream5-format (see Dream5 homepage) and stores the trained model to a file.
- Run by calling
java -jar Dream5.jar
- Arguments:
home ... home directory (The home directory where the data reside., default = .) file ... input file (The input file in Dream5 format (column 1: TF, column 2: array type, column 3: sequences, column 4: signal, last column: flag), one file per TF and array type, path relative to home directory.) mo ... motif order (The order of the inhomogeneous Markov model for the motif., default = 1) fo ... flanking oder (The order of the homogeneous Markov model for flanking sequence and background., default = 3) starts ... starts (The number of starts of the optimization., default = 5) threads ... threads (The number of threads, i.e. cores, that are used for optimization., default = 1) q ... q (A-priori fraction of data points with weight greater than 0.5, default = 0.025) model ... model (File where the trained model is stored as XML, path relative to home directory., default = model.xml)
- Example:
java -jar Dream5.jar file=TF1_HK.txt starts=1 threads=2 model=mymodel.xml
Predictions
Loads a trained model from a file and predicts intensities for a given set of probe sequences in Dream5 submission format (see Dream5 homepage).
- Run by calling
java -jar Dream5Predict.jar
- Arguments:
home ... home directory (The home directory where the data reside., default = .) file ... input file (The input file in Dream5 format for predictions (column 1: TF, column 2: array type, column 3: sequence), one file per TF and array type, path relative to home directory.) model ... model (File with the trained model stored as XML, path relative to home directory., default = model.xml) outfile ... outfile (File where the predictions are stored., default = predictions.txt)
- Example:
java -jar Dream5Predict.jar file=TF1_ME.txt model=mymodel.xml outfile=predicitions_TF1_ME.txt