Revision as of 14:35, 2 February 2009

This site contains projects that use Jstacs.

MotifAdjuster

by Jens Keilwagen, Jan Baumbach, Thomas Kohl and Ivo Grosse.

Description

Valuable binding site annotation data are stored in databases. However, several types of errors can, and do, occur in the process of manually incorporating annotation data from scientific literature into these databases. Here, we introduce MotifAdjuster, a software that helps to detect these errors, and we demonstrate its efficacy on public data sets.

Paper

The paper MotifAdjuster: A tool for computational reassessment of transcription factor binding site annotations has been submitted to Genome Biology.

Download

MotifAdjuster download can be downloaded here.

Start instructions

If you have unzipped the archive, you can start the MotifAdjuster by invoking

java -cp ./:./jstacs-1.1.jar:./numericalMethods.jar MotifAdjuster <file> <ignoreChar> <length> <fgOrder> <fgEss> <bothStrands> <output> <sigma> <p(no motif)>

In Windows, you have to use ";" instead of ":" in the class path.

The arguments have the following meaning

name	comment	type

file	the location of the data set	String
ignoreChar	char for comment lines (e.g. for a FastA-file '>')	char
length	the motif length	int
fgOrder	the order of the inhomogeneous Markov model that is uses for the motif; 0 yields in a PWM	byte
ess	the equivalent sample size that is used for the mixture model	double >= 0
bothStrands	use both strands	boolean
output	output of the EM	boolean
sigma	the sigma of the truncated discrete Gaussian distribution	double>0
p(no motif)	the probability for finding no motif	0<=double<1

DiPoMM

by Jens Keilwagen, Jan Grau, Stefan Posch, Marc Strickert and Ivo Grosse.

Description

Transcription factors are one main component of gene regulation, as they activate or repress gene expression by binding to their binding sites. The de-novo discovery of transcription factor binding sites in the promoters of target genes is a challenging problem in bioinformatics, which has not yet been solved satisfactorily. We present DiPoMM, a discriminative de-novo motif discovery tool that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process.

Paper

The paper DiPoMM: Discriminative de-novo motif discovery utilizing positional preference has been submitted to ISMB 2009.

Download

DiPoMM download can be downloaded here.

Start instructions

Once you have unzipped the archive, you can start DiPoMM e.g. by invoking

java -cp .:jstacs-1.2.jar:lib/numericalMethods.jar:lib/bytecode.jar:lib/biojava-live.jar projects.DiPoMM home=path/to/data/directory/ fg=fgfile.txt bg=bgfile.txt init=best-random=100 p-val=1E-4

to search for motifs that are over-represented in path/to/data/directory/fgfile.txt but not in path/to/data/directory/bgfile.txt, initialize DiPoMM with the best from 100 randomly drawn starting values, and search for motif occurrences with a p-value less than 1E-4.

Under Windows, you must use ";" instead of ":" in the class path.

The arguments have the following meaning

name	comment	type

home	the path to the data directory, default = ./	String
ignore	the char that is used to mask comment lines in data files, e.g., '>' in a FASTA-file, default = >	Character
fg	the file name of the foreground data file (the file containing sequences which are expected to contain binding sites of a common motif)	String
bg	the file name of the background data file	String
length	the motif length that is used at the beginning, valid range = [1, 50], default = 15	Integer
flankOrder	The Markov order of the model for the flanking sequence and the background sequence, valid range = [0, 5], default = 0	Integer
motifOrder	The Markov order of the motif model, valid range = [0, 3], default = 0	Integer
bothStrands	a switch whether to use both strands or not, default = true	Boolean
init	the method that is used for initialization, one of 'best-random=<number>', 'enum=<length>', and 'specific=<sequence or file of sequence>'	String=[Integer \| String]
xml	the file name of the xml file the classifier is written to, default = ./classifier.xml	String
adjust	a switch whether to adjust the motif length, i.e., either to shrink or expand, default = true	Boolean
p-val	a p-value for predicting binding sites, valid range = [0.0, 1.0], OPTIONAL	Double

@@ Line 21: / Line 21: @@
 In Windows, you have to use &quot;;&quot; instead of &quot;:&quot; in the class path.
 The arguments have the following meaning
@@ Line 73: / Line 74: @@
 	<td>the probability for finding no motif</td>
 	<td>0&lt;=double&lt;1</td>
+</tr>
+</table>
+= DiPoMM =
+by Jens Keilwagen, Jan Grau, Stefan Posch, Marc Strickert and Ivo Grosse.
+== Description ==
+Transcription factors are one main component of gene regulation, as they activate or repress gene expression by binding to their binding sites. The de-novo discovery of transcription factor binding sites in the promoters of target genes is a challenging problem in bioinformatics, which has not yet been solved satisfactorily.
+We present DiPoMM, a discriminative de-novo motif discovery tool that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process.
+== Paper ==
+The paper '''''DiPoMM: Discriminative de-novo motif discovery utilizing positional preference''''' has been submitted to [http://www.iscb.org/ismbeccb2009/ ISMB 2009].
+== Download ==
+DiPoMM download can be downloaded [http://www.jstacs.de/downloads/DiPoMM.zip here].
+== Start instructions ==
+Once you have unzipped the archive, you can start DiPoMM e.g. by invoking
+<code>java -cp .:jstacs-1.2.jar:lib/numericalMethods.jar:lib/bytecode.jar:lib/biojava-live.jar projects.DiPoMM home=path/to/data/directory/ fg=fgfile.txt bg=bgfile.txt init=best-random=100 p-val=1E-4</code>
+to search for motifs that are over-represented in <code>path/to/data/directory/fgfile.txt</code> but not in <code>path/to/data/directory/bgfile.txt</code>, initialize DiPoMM with the best from 100 randomly drawn starting values, and search for motif occurrences with a p-value less than <code>1E-4</code>.
+Under Windows, you must use &quot;;&quot; instead of &quot;:&quot; in the class path.
+The arguments have the following meaning
+<table border=0 cellpadding=10 align="center">
+<tr>
+	<td>name</td>
+	<td>comment</td>
+	<td>type</td>
+</tr>
+<tr><td colspan=3><hr></td></tr>
+<tr>
+	<td><font color="green">home</font></td>
+	<td>the path to the data directory, default = ./</td>
+	<td>String</td>
+</tr>
+<tr>
+	<td><font color="green">ignore</font></td>
+	<td>the char that is used to mask comment lines in data files, e.g., '>' in a FASTA-file, default = ></td>
+	<td>Character</td>
+</tr>
+<tr>
+	<td><font color="green">fg</font></td>
+	<td>the file name of the foreground data file (the file containing sequences which are expected to contain binding sites of a common motif)</td>
+	<td>String</td>
+</tr>
+<tr>
+	<td><font color="green">bg</font></td>
+	<td>the file name of the background data file</td>
+	<td>String</td>
+</tr>
+<tr>
+	<td><font color="green">length</font></td>
+	<td>the motif length that is used at the beginning, valid range = [1, 50], default = 15</td>
+	<td>Integer</td>
+</tr>
+<tr>
+	<td><font color="green">flankOrder</font></td>
+	<td>The Markov order of the model for the flanking sequence and the background sequence, valid range = [0, 5], default = 0</td>
+	<td>Integer</td>
+</tr>
+<tr>
+	<td><font color="green">motifOrder</font></td>
+	<td>The Markov order of the motif model, valid range = [0, 3], default = 0</td>
+	<td>Integer</td>
+</tr>
+<tr>
+	<td><font color="green">bothStrands</font></td>
+	<td>a switch whether to use both strands or not, default = true</td>
+	<td>Boolean</td>
+</tr>
+<tr>
+	<td><font color="green">init</font></td>
+	<td>the method that is used for initialization, one of 'best-random=<number>', 'enum=<length>', and 'specific=<sequence or file of sequence>'</td>
+	<td>String=[Integer | String]</td>
+</tr>
+<tr>
+	<td><font color="green">xml</font></td>
+	<td>the file name of the xml file the classifier is written to, default = ./classifier.xml</td>
+	<td>String</td>
+</tr>
+<tr>
+	<td><font color="green">adjust</font></td>
+	<td>a switch whether to adjust the motif length, i.e., either to shrink or expand, default = true</td>
+	<td>Boolean</td>
+</tr>
+<tr>
+	<td><font color="green">p-val</font></td>
+	<td>a p-value for predicting binding sites, valid range = [0.0, 1.0], OPTIONAL</td>
+	<td>Double</td>
 </tr>
 </table>

Projects: Difference between revisions

Revision as of 14:35, 2 February 2009

Contents

MotifAdjuster

Description

Paper

Download

Start instructions

DiPoMM

Description

Paper

Download

Start instructions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Documentation

Tools