PMMdeNovo: Difference between revisions

Revision as of 13:28, 21 February 2015

by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse

Runnable JARs

The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. Default values can be used by assigning "def". Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.

ModelTrainer

The tool ModelTrainer performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model. Run by calling

java -jar InhPMM.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output

where the arguments have the following semantics:

name	type	default	comment

inputFile	String	--	The location of a text file containing the input sequences. If the first character in the file is '>' the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.
motifWidth	Integer	20	The width of the motif to be inferred.
motifOrder	Integer	2	The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.
flankingOrder	Integer	2	The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.
initSteps	Integer	50	The number of initial iterations steps that the algorithm is always run for each restart.
addSteps	Integer	10	The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.
restarts	Integer	10	The number of restarts of the algorithm.
output	String	model	The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.

BindingSitePrediction

The tool BindingSitePrediction predicts instances of binding sites in a positive data set based on a previously learned model. Run by calling

java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output

where the arguments have the following semantics:

name	type	default	comment

modelFile	String	--	The location of the .xml representation (output of ModelTrainer) of the learned model.
dataPos	String	--	The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.
dataNeg	String	--	The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.
alpha	Integer	1E-4	Significance level on negative data.
output	String	bindingSites.txt	Location of output file for writing the predicted binding sites.

Classification

The tool Classification performs first a motif discovery with subsequent fragment-based classification using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. Run by calling

java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts

where the arguments have the following semantics:

name	type	default	comment

filePosTrain	String	--	The location of a text file containing the positive training sequences (fasta or plain text).
fileNegTrain	String	--	The location of a text file containing the negative training sequences (fasta or plain text).
filePosTest	String	--	The location of a text file containing the positive test sequences (fasta or plain text).
fileNegTest	String	--	The location of a text file containing the negative test sequences (fasta or plain text).
motifWidth	Integer	20	The width of the motif to be inferred.
motifOrder	Integer	2	The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.
flankingOrder	Integer	2	The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.
initSteps	Integer	50	The number of initial iterations steps that the algorithm is always run for each restart.
addSteps	Integer	10	The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.
restarts	Integer	10	The number of restarts of the algorithm.

The tool returns the classification results to the standard output.

Data

The exemplary data sets contain extracted ChIP seq sequences of 50 different human transcription factors from the ENCODE project, as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.

Source code

Building the source code requires Jstacs 2.1.

@@ Line 3: / Line 3: @@
 == Runnable JARs ==
-The application consists of three independent tools.
+The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments.
+Default values can be used by assigning "def". Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.
 === ModelTrainer ===
@@ Line 119: / Line 120: @@
 === Classification ===
-The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. The tool returns the classification results to the standard output.
+The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif.
+Run by calling
+<code>java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts</code>
+where the arguments have the following semantics:
+<table border=0 cellpadding=10 align="center">
+<tr>
+	<td>name</td>
+	<td>type</td>
+        <td>default</td>
+	<td>comment</td>
+</tr>
+<tr><td colspan=4><hr></td></tr>
+<tr>
+	<td><font color="green">filePosTrain</font></td>
+	<td>String</td>
+	<td>--</td>
+	<td>The location of a text file containing the positive training sequences (fasta or plain text).</td>
+</tr>
+<tr>
+	<td><font color="green">fileNegTrain</font></td>
+	<td>String</td>
+	<td>--</td>
+	<td>The location of a text file containing the negative training sequences (fasta or plain text).</td>
+</tr>
+<tr>
+	<td><font color="green">filePosTest</font></td>
+	<td>String</td>
+	<td>--</td>
+	<td>The location of a text file containing the positive test sequences (fasta or plain text).</td>
+</tr>
+<tr>
+	<td><font color="green">fileNegTest</font></td>
+	<td>String</td>
+	<td>--</td>
+	<td>The location of a text file containing the negative test sequences (fasta or plain text).</td>
+</tr>
+<tr>
+	<td><font color="green">motifWidth</font></td>
+	<td>Integer</td>
+        <td>20</td>
+	<td>The width of the motif to be inferred.</td>
+</tr>
+<tr>
+	<td><font color="green">motifOrder</font></td>
+	<td>Integer</td>
+        <td>2</td>
+	<td>The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.</td>
+</tr>
+<tr>
+	<td><font color="green">flankingOrder</font></td>
+	<td>Integer</td>
+        <td>2</td>
+	<td>The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.</td>
+</tr>
+<tr>
+	<td><font color="green">initSteps</font></td>
+	<td>Integer</td>
+        <td>50</td>
+	<td>The number of initial iterations steps that the algorithm is always run for each restart.</td>
+</tr>
+<tr>
+	<td><font color="green">addSteps</font></td>
+	<td>Integer</td>
+        <td>10</td>
+	<td>The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.</td>
+</tr>
+<tr>
+	<td><font color="green">restarts</font></td>
+	<td>Integer</td>
+        <td>10</td>
+	<td>The number of restarts of the algorithm.</td>
+</tr>
+</table>
+The tool returns the classification results to the standard output.
 == Data ==

PMMdeNovo: Difference between revisions

Revision as of 13:28, 21 February 2015

Runnable JARs

ModelTrainer

BindingSitePrediction

Classification

Data

Source code

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Documentation

Tools