Dream5
by Jan Grau, Stefan Posch, Ivo Grosse, and Jens Keilwagen.
Description
Motivation
Protein binding microarrays (PBMs) are valuable for elucidating the binding affinity of transcription factors to short DNA sequence in vitro. However, learning accurate models of transcription factor binding from these data is still a challenging problem of bioinformatics. Here, we present a novel approach for analyzing PBM data based on a combination of discriminative learning of a ZOOPS model and an appropriate soft-labeling of the probe sequences.
Results
We implement this approach as extension of Dispom and evaluate it on the benchmark data of Dream5 challenge 2, a rigorous benchmark for the analysis of PBM data. We find an improved overall performance compared to the participants of the challenge. Besides discriminative learning of model parameters, one reason for the superior performance of Dispom is the consideration of dependencies between adjacent positions of the binding sites, which suggests exploiting dependencies between adjacent base pairs in transcription factor binding sites whenever enough training data are available. Another important property is that this novel approach is robust to typical artifacts of PBM experiments, which facilitates its application to PBM data without the need for prior normalization.
Paper
The paper Accurate prediction of protein binding microarray data by discriminative de-novo motif discovery has been submitted to ISMB 2011.
Binary
- Binaries for training and prediction as ZIP-archive
- Sources of the binaries, require Jstacs 1.4 sources to build
Training
- Run by calling
java -jar Dream5.jar
- Arguments:
home ... home directory (The home directory where the data reside., default = .) file ... input file (The input file in Dream5 format (column 1: TF, column 2: array type, column 3: sequences, column 4: signal, last column: flag), one file per TF and array type, path relative to home directory.) mo ... motif order (The order of the inhomogeneous Markov model for the motif., default = 1) fo ... flanking oder (The order of the homogeneous Markov model for flanking sequence and background., default = 3) starts ... starts (The number of starts of the optimization., default = 5) threads ... threads (The number of threads, i.e. cores, that are used for optimization., default = 1) q ... q (A-priori fraction of data points with weight greater than 0.5, default = 0.025) model ... model (File where the trained model is stored as XML, path relative to home directory., default = model.xml)
- Example:
java -jar Dream5.jar file=TF1_HK.txt starts=1 threads=2 model=mymodel.xml
Predictions
- Run by calling
java -jar Dream5Predict.jar
- Arguments:
home ... home directory (The home directory where the data reside., default = .) file ... input file (The input file in Dream5 format for predictions (column 1: TF, column 2: array type, column 3: sequence), one file per TF and array type, path relative to home directory.) model ... model (File with the trained model stored as XML, path relative to home directory., default = model.xml) outfile ... outfile (File where the predictions are stored., default = predictions.txt)
- Example:
java -jar Dream5.jar file=TF1_ME.txt model=mymodel.xml outfile=predicitions_TF1_ME.txt