PMMdeNovo: Difference between revisions
No edit summary |
No edit summary |
||
Line 72: | Line 72: | ||
=== BindingSitePrediction === | === BindingSitePrediction === | ||
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model. | The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model. | ||
Run by calling | |||
<code>java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output</code> | |||
where the arguments have the following semantics: | |||
<table border=0 cellpadding=10 align="center"> | |||
<tr> | |||
<td>name</td> | |||
<td>type</td> | |||
<td>default</td> | |||
<td>comment</td> | |||
</tr> | |||
<tr><td colspan=4><hr></td></tr> | |||
<tr> | |||
<td><font color="green">modelFile</font></td> | |||
<td>String</td> | |||
<td>--</td> | |||
<td>The location of the .xml representation (output of ModelTrainer) of the learned model.</td> | |||
</tr> | |||
<tr> | |||
<td><font color="green">dataPos</font></td> | |||
<td>String</td> | |||
<td>--</td> | |||
<td>The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.</td> | |||
</tr> | |||
<tr> | |||
<td><font color="green">dataNeg</font></td> | |||
<td>String</td> | |||
<td>--</td> | |||
<td>The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.</td> | |||
</tr> | |||
<tr> | |||
<td><font color="green">alpha</font></td> | |||
<td>Integer</td> | |||
<td>1E-4</td> | |||
<td>Significance level on negative data.</td> | |||
</tr> | |||
<tr> | |||
<td><font color="green">output</font></td> | |||
<td>String</td> | |||
<td>bindingSites.txt</td> | |||
<td>Location of output file for writing the predicted binding sites.</td> | |||
</tr> | |||
</table> | |||
=== Classification === | === Classification === |
Revision as of 13:05, 21 February 2015
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse
Runnable JARs
The application consists of three independent tools.
ModelTrainer
The tool ModelTrainer performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model. Run by calling
java -jar InhPMM.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output
where the arguments have the following semantics:
name | type | default | comment |
inputFile | String | -- | The location of a text file containing the input sequences. If the first character in the file is '>' the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence. |
motifWidth | Integer | 20 | The width of the motif to be inferred. |
motifOrder | Integer | 2 | The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies. |
flankingOrder | Integer | 2 | The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif. |
initSteps | Integer | 50 | The number of initial iterations steps that the algorithm is always run for each restart. |
addSteps | Integer | 10 | The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed. |
restarts | Integer | 10 | The number of restarts of the algorithm. |
output | String | model | The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures. |
BindingSitePrediction
The tool BindingSitePrediction predicts instances of binding sites in a positive data set based on a previously learned model. Run by calling
java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output
where the arguments have the following semantics:
name | type | default | comment |
modelFile | String | -- | The location of the .xml representation (output of ModelTrainer) of the learned model. |
dataPos | String | -- | The location of the positive data (fasta file or plain text) in which binding site locations are to be identified. |
dataNeg | String | -- | The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold. |
alpha | Integer | 1E-4 | Significance level on negative data. |
output | String | bindingSites.txt | Location of output file for writing the predicted binding sites. |
Classification
The tool Classification performs first a motif discovery with subsequent fragment-based classification using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. The tool returns the classification results to the standard output.
Data
The exemplary data sets contain extracted ChIP seq sequences of 50 different human transcription factors from the ENCODE project, as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.
Source code
Building the source code requires Jstacs 2.1.