Dream5

by Jan Grau, Stefan Posch, Ivo Grosse, and Jens Keilwagen.

Description

Motivation

Protein binding microarrays (PBMs) are valuable for elucidating the binding affinity of transcription factors to short DNA sequence in vitro. However, learning accurate models of transcription factor binding from these data is still a challenging problem of bioinformatics. Here, we present a novel approach for analyzing PBM data based on a combination of discriminative learning of a ZOOPS model and an appropriate soft-labeling of the probe sequences.

Results

We implement this approach as extension of Dispom and evaluate it on the benchmark data of Dream5 challenge 2, a rigorous benchmark for the analysis of PBM data. We find an improved overall performance compared to the participants of the challenge. Besides discriminative learning of model parameters, one reason for the superior performance of Dispom is the consideration of dependencies between adjacent positions of the binding sites, which suggests exploiting dependencies between adjacent base pairs in transcription factor binding sites whenever enough training data are available. Another important property is that this novel approach is robust to typical artifacts of PBM experiments, which facilitates its application to PBM data without the need for prior normalization.

Paper

The paper Accurate prediction of protein binding microarray data by discriminative de-novo motif discovery has been submitted to ISMB 2011.

Binary

Binaries for training and prediction
Sources of the binaries, require Jstacs 1.4 sources to build

Training

Run by calling java -jar Dream5.jar
Arguments:

home ... home directory (The home directory where the data reside., default = .) file ... input file (The input file in Dream5 format (column 1: TF, column 2: array type, column 3: sequences, column 4: signal, last column: flag), one file per TF and array type, path relative to home directory.) mo ... motif order (The order of the inhomogeneous Markov model for the motif., default = 1) fo ... flanking oder (The order of the homogeneous Markov model for flanking sequence and background., default = 3) starts ... starts (The number of starts of the optimization., default = 5) threads ... threads (The number of threads, i.e. cores, that are used for optimization., default = 1) q ... q (A-priori fraction of data points with weight greater than 0.5, default = 0.025) model ... model (File where the trained model is stored as XML, path relative to home directory., default = model.xml)

Example: java -jar Dream5.jar file=TF1_HK.txt starts=1 threads=2 model=mymodel.xml

Predictions

Run by calling java -jar Dream5Predict.jar
Arguments:

home ... home directory (The home directory where the data reside., default = .) file ... input file (The input file in Dream5 format for predictions (column 1: TF, column 2: array type, column 3: sequence), one file per TF and array type, path relative to home directory.) model ... model (File with the trained model stored as XML, path relative to home directory., default = model.xml) outfile ... outfile (File where the predictions are stored., default = predictions.txt)

Example: java -jar Dream5.jar file=TF1_ME.txt model=mymodel.xml outfile=predicitions_TF1_ME.txt

Dream5

Description

Motivation

Results

Paper

Binary

Training

Predictions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Documentation

Tools