<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.jstacs.de/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Eggeling</id>
	<title>Jstacs - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.jstacs.de/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Eggeling"/>
	<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php/Special:Contributions/Eggeling"/>
	<updated>2026-04-04T12:37:41Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.38.2</generator>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=969</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=969"/>
		<updated>2018-11-16T14:39:17Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: final reference&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Disentangling transcription factor binding site complexity]. &#039;&#039;Nucleic Acids Research&#039;&#039;, 2018; 46(20): e121.  doi: 10.1093/nar/gky683&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later). &lt;br /&gt;
It is recommended to use the GUI for testing purposes or analysis of a single data set and to resort to the CLI for more elaborate applications (multiple data sets, use on a cluster, etc.).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; (IMD) and &#039;&#039;Motif complexity analysis&#039;&#039; (MCA). In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection (IMD) ===&lt;br /&gt;
IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts.&lt;br /&gt;
&lt;br /&gt;
The default value of intermixture threshold (0.19) is a robust choice; slight variations in the interval (0.15,0.3) have only little impact for the majority of cases (see paper).&lt;br /&gt;
&lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure in each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. While this option is included for experimental purposes; keeping the default for practical applications is strongly recommended, as otherwise adjusting the intermixture threshold might be needed.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis (MCA) ===&lt;br /&gt;
MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixtures of PWM models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
The tool itself requires, for each run, to choose a concrete model as input.&lt;br /&gt;
It returns a text-file containing the intra-motif complexity (IMC) measure of the data set under the given model.&lt;br /&gt;
For comparing different models according to IMC, the tool thus needs to be run multiple times.&lt;br /&gt;
&lt;br /&gt;
In addition, each run also outputs a visualization of the learned model and a storable (.xml) file that can be used as input to &amp;quot;Sequence scan&amp;quot; (see below).&lt;br /&gt;
&lt;br /&gt;
Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for data sets with sequence length greater than 20.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [https://www.cs.helsinki.fi/u/eggeling/Disentangler/Disentangler-sources.zip source code] requires Jstacs 2.3 and JstacsFX 1.0. For compiling instructions see the included README.txt file.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Main_Page&amp;diff=968</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Main_Page&amp;diff=968"/>
		<updated>2018-11-13T16:23:06Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: /* Applications */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== A &amp;lt;font color=FireBrick&amp;gt;J&amp;lt;/font&amp;gt;ava framework for &amp;lt;font color=FireBrick&amp;gt;st&amp;lt;/font&amp;gt;atistical &amp;lt;font color=FireBrick&amp;gt;a&amp;lt;/font&amp;gt;nalysis and &amp;lt;font color=FireBrick&amp;gt;c&amp;lt;/font&amp;gt;lassification of biological &amp;lt;font color=FireBrick&amp;gt;s&amp;lt;/font&amp;gt;equences ==&lt;br /&gt;
&lt;br /&gt;
Sequence analysis is one of the major subjects of&lt;br /&gt;
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].&lt;br /&gt;
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as&lt;br /&gt;
alignment algorithms.&lt;br /&gt;
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an&lt;br /&gt;
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches&lt;br /&gt;
for parameter learning. Using Jstacs, classifiers can be assessed and&lt;br /&gt;
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented&lt;br /&gt;
design Jstacs is easy to use and readily extensible.&lt;br /&gt;
&lt;br /&gt;
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].&lt;br /&gt;
&lt;br /&gt;
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.&lt;br /&gt;
&lt;br /&gt;
== Licensing Information ==&lt;br /&gt;
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].&lt;br /&gt;
&lt;br /&gt;
== Current release ==&lt;br /&gt;
You can download Jstacs version 2.3 [[Downloads | here]].&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;You find an overview of the new features in the [[Version history]].&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.&lt;br /&gt;
&lt;br /&gt;
The current Jstacs code, including changes made since the last release, is available from [https://github.com/Jstacs github].&lt;br /&gt;
&lt;br /&gt;
== Getting started &amp;amp; Cookbook==&lt;br /&gt;
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].&lt;br /&gt;
&lt;br /&gt;
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].&lt;br /&gt;
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.&lt;br /&gt;
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.&lt;br /&gt;
&lt;br /&gt;
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].&lt;br /&gt;
&lt;br /&gt;
== Publication ==&lt;br /&gt;
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.&lt;br /&gt;
If you use Jstacs in your research, please cite&lt;br /&gt;
&lt;br /&gt;
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. &#039;&#039;Jstacs: A java framework for statistical analysis and classification of biological sequences&#039;&#039;. Journal of Machine Learning Research, &#039;&#039;&#039;13&#039;&#039;&#039;(Jun):1967–1971, 2012.&lt;br /&gt;
&lt;br /&gt;
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]&lt;br /&gt;
&lt;br /&gt;
== JstacsFX ==&lt;br /&gt;
JstacsFX is a library for building applications with graphical user interface based on Jstacs classes and using JavaFX. JstacsFX builds upon the [http://www.jstacs.de/api/de/jstacs/tools/JstacsTool.html JstacsTool] interface that has also been used to create [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/cli/CLI.html command line] and [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/galaxy/Galaxy.html Galaxy] versions of tools with minimal effort. In addition it makes use of the [http://www.jstacs.de/api-2.3/de/jstacs/parameters/Parameter.html Parameter], [http://www.jstacs.de/api-2.3/de/jstacs/results/Result.html Result], and [http://www.jstacs.de/api-2.3/de/jstacs/results/savers/ResultSaver.html ResultSaver] classes of Jstacs.&lt;br /&gt;
&lt;br /&gt;
The current release of JstacsFX is available from [[Downloads]] and an [http://www.jstacs.de/api-fx/index.html API documentation] is available.&lt;br /&gt;
&lt;br /&gt;
Example applications using JstacsFX for their graphical user interface are [[InMoDe]] and [[AnnoTALE]].&lt;br /&gt;
&lt;br /&gt;
== Applications ==&lt;br /&gt;
Applications currently using Jstacs:&lt;br /&gt;
* [[MotifAdjuster]]&lt;br /&gt;
* [[Dispom]]&lt;br /&gt;
* [[TALgetter]]&lt;br /&gt;
* [[TALENoffer]]&lt;br /&gt;
* [[Dimont]]&lt;br /&gt;
* [[GeMoMa]]&lt;br /&gt;
* [[AnnoTALE]]&lt;br /&gt;
* [[InMoDe]]&lt;br /&gt;
* [[Disentangler]]&lt;br /&gt;
&lt;br /&gt;
== Bug reports &amp;amp; Feature requests ==&lt;br /&gt;
You can submit bug reports and feature requests by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de].&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Latest Papers ==&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PCTLearn | Algorithms for learning parsimonious context trees]]&#039;&#039;&#039;&#039;&#039; has been published in [https://link.springer.com/article/10.1007/s10994-018-5770-9 Machine Learning].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Disentangler | Disentangling transcription factor binding site complexity]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[GeMoMa | Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi]]&#039;&#039;&#039;&#039;&#039; has been published in [https://link.springer.com/article/10.1186%2Fs12859-018-2203-5 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[InMoDe | InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AnnoTALE | AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/articles/srep21077 Scientific Reports].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Slim | Varying levels of complexity in transcription factor binding motifs]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
Further papers and projects can be found under [[Projects]].&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Projects&amp;diff=967</id>
		<title>Projects</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Projects&amp;diff=967"/>
		<updated>2018-11-13T16:04:17Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This site contains projects that use Jstacs.&lt;br /&gt;
* [[MotifAdjuster]]: a tool for computational reassessment of transcription factor binding site annotations&lt;br /&gt;
* [[Prior]]: apples and oranges: avoiding different priors in Bayesian DNA sequence analysis&lt;br /&gt;
* [[GenDisMix]]: unifying generative and discriminative learning principles&lt;br /&gt;
* [[Dispom]]: de-novo discovery of differentially abundant transcription factor binding sites including their positional preference&lt;br /&gt;
* [[MiMB]]: probabilistic approaches to transcription factor binding site prediction&lt;br /&gt;
* [[SHMM]]: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data&lt;br /&gt;
* [[DSHMM]]: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles&lt;br /&gt;
* [[PHHMM]]: improved analysis of Array-CGH data&lt;br /&gt;
* [[MeDIP-HMM]]: HMM-based analysis of DNA methylation profiles&lt;br /&gt;
* [[ARHMM]]: integrating local chromosomal dependencies into the analysis of tumor expression profiles&lt;br /&gt;
* [[FlowCap]]: molecular classification of acute myeloid leukaemia (AML) using flow cytometry data&lt;br /&gt;
* [[TALgetter]]: prediction of TAL effector target sites&lt;br /&gt;
* [[TALENoffer]]: genome-wide TALEN off-target prediction&lt;br /&gt;
* [[Dimont]]: general approach for discriminative de-novo motif discovery from high-throughput data&lt;br /&gt;
* [[AUC-PR]]: area under ROC and PR curves for weighted and unweighted data&lt;br /&gt;
* [[Slim]]: Sparse local inhomogeneous mixture (Slim) models and dependency logos&lt;br /&gt;
* [[PMMdeNovo]]: de novo motif discovery based on inhomogeneous parsimonious Markov models (PMMs) for exploiting intra-motif dependencies&lt;br /&gt;
* [[AnnoTALE]]: identifying and analysing TALEs in &#039;&#039;Xanthomonas&#039;&#039; genomes, for clustering TALEs, for assigning novel TALEs to existing classes, for proposing TALE names using a unified nomenclature, and for predicting TALE targets&lt;br /&gt;
* [[GeMoMa]]: Gene Model Mapper (GeMoMa) is a homology-based gene prediction program that uses the annotation of protein-coding genes in a reference genome to infer annotation of protein-coding genes in a target genome&lt;br /&gt;
* [[InMoDe]]: tools for learning and visualizing intra-motif dependencies of DNA binding sites&lt;br /&gt;
* [[Disentangler]]: two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
* [[PCTLearn]]: efficient learning of parsimonious context trees from sequence data.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=966</id>
		<title>PCTLearn</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=966"/>
		<updated>2018-11-13T16:00:23Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use PCTLearn, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, M. Koivisto. [https://link.springer.com/article/10.1007/s10994-018-5770-9 Algorithms for learning parsimonious context trees]. &#039;&#039;Machine Learning&#039;&#039;, 2018; doi: 10.1007/s10994-018-5770-9&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Description ==&lt;br /&gt;
&lt;br /&gt;
Parsimonious context trees, PCTs, provide a sparse parameterization of conditional probability distributions, but learning them from data is computationally hard due to the combinatorial explosion of the space of model structures as the number of predictor variables grows. Here, we propose new algorithmic ideas, which can ignificantly expedite the standard dynamic programming algorithm. Specifically, we introduce a memoization technique, which exploits regularities within the predictor variables by equating different contexts associated with the same data subset, and a bound-and-prune technique, which exploits regularities within the response variable by pruning parts of the search space based on score upper bounds.   &lt;br /&gt;
The software &#039;&#039;&#039;PCTLearn&#039;&#039;&#039; is a lightweight Java application that implements these ideas and can be used to learn a single PCT of user-specified maximal depth from data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download == &lt;br /&gt;
&lt;br /&gt;
The application is available as a single runnable .jar [https://www.cs.helsinki.fi/u/eggeling/PCTLearn/PCTLearn.jar PCTLearn].&lt;br /&gt;
&lt;br /&gt;
== Input ==&lt;br /&gt;
The application requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. &lt;br /&gt;
The number of different characters in the input file determines the alphabet size for PCT optimization.&lt;br /&gt;
Each line in the input file is chopped into overlapping k-mers, where k to the desired maximal PCT depth + 1. The PCT is learned on these resulting k-mers with the convention that the last symbol in each k-mer denotes the response variable.&lt;br /&gt;
&lt;br /&gt;
== Running PCTLearn ==&lt;br /&gt;
&lt;br /&gt;
The application has one mandatory and various optional arguments. &lt;br /&gt;
A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
Run with &lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input data. &amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;maximalDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The maximal depth of the learned PCT.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;scoringFunction&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;BIC&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used scoring function. Permitted values are &amp;quot;BIC&amp;quot; and &amp;quot;AIC&amp;quot;.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoization&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling memoization.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;pruning&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling pruning.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fineBound&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoLimit&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;lookaheadDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used lookahead depth. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt; &lt;br /&gt;
&lt;br /&gt;
== Output ==&lt;br /&gt;
The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout. &lt;br /&gt;
&lt;br /&gt;
It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Main_Page&amp;diff=965</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Main_Page&amp;diff=965"/>
		<updated>2018-11-13T15:48:04Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: /* Latest Papers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== A &amp;lt;font color=FireBrick&amp;gt;J&amp;lt;/font&amp;gt;ava framework for &amp;lt;font color=FireBrick&amp;gt;st&amp;lt;/font&amp;gt;atistical &amp;lt;font color=FireBrick&amp;gt;a&amp;lt;/font&amp;gt;nalysis and &amp;lt;font color=FireBrick&amp;gt;c&amp;lt;/font&amp;gt;lassification of biological &amp;lt;font color=FireBrick&amp;gt;s&amp;lt;/font&amp;gt;equences ==&lt;br /&gt;
&lt;br /&gt;
Sequence analysis is one of the major subjects of&lt;br /&gt;
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].&lt;br /&gt;
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as&lt;br /&gt;
alignment algorithms.&lt;br /&gt;
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an&lt;br /&gt;
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches&lt;br /&gt;
for parameter learning. Using Jstacs, classifiers can be assessed and&lt;br /&gt;
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented&lt;br /&gt;
design Jstacs is easy to use and readily extensible.&lt;br /&gt;
&lt;br /&gt;
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].&lt;br /&gt;
&lt;br /&gt;
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.&lt;br /&gt;
&lt;br /&gt;
== Licensing Information ==&lt;br /&gt;
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].&lt;br /&gt;
&lt;br /&gt;
== Current release ==&lt;br /&gt;
You can download Jstacs version 2.3 [[Downloads | here]].&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;You find an overview of the new features in the [[Version history]].&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.&lt;br /&gt;
&lt;br /&gt;
The current Jstacs code, including changes made since the last release, is available from [https://github.com/Jstacs github].&lt;br /&gt;
&lt;br /&gt;
== Getting started &amp;amp; Cookbook==&lt;br /&gt;
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].&lt;br /&gt;
&lt;br /&gt;
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].&lt;br /&gt;
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.&lt;br /&gt;
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.&lt;br /&gt;
&lt;br /&gt;
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].&lt;br /&gt;
&lt;br /&gt;
== Publication ==&lt;br /&gt;
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.&lt;br /&gt;
If you use Jstacs in your research, please cite&lt;br /&gt;
&lt;br /&gt;
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. &#039;&#039;Jstacs: A java framework for statistical analysis and classification of biological sequences&#039;&#039;. Journal of Machine Learning Research, &#039;&#039;&#039;13&#039;&#039;&#039;(Jun):1967–1971, 2012.&lt;br /&gt;
&lt;br /&gt;
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]&lt;br /&gt;
&lt;br /&gt;
== JstacsFX ==&lt;br /&gt;
JstacsFX is a library for building applications with graphical user interface based on Jstacs classes and using JavaFX. JstacsFX builds upon the [http://www.jstacs.de/api/de/jstacs/tools/JstacsTool.html JstacsTool] interface that has also been used to create [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/cli/CLI.html command line] and [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/galaxy/Galaxy.html Galaxy] versions of tools with minimal effort. In addition it makes use of the [http://www.jstacs.de/api-2.3/de/jstacs/parameters/Parameter.html Parameter], [http://www.jstacs.de/api-2.3/de/jstacs/results/Result.html Result], and [http://www.jstacs.de/api-2.3/de/jstacs/results/savers/ResultSaver.html ResultSaver] classes of Jstacs.&lt;br /&gt;
&lt;br /&gt;
The current release of JstacsFX is available from [[Downloads]] and an [http://www.jstacs.de/api-fx/index.html API documentation] is available.&lt;br /&gt;
&lt;br /&gt;
Example applications using JstacsFX for their graphical user interface are [[InMoDe]] and [[AnnoTALE]].&lt;br /&gt;
&lt;br /&gt;
== Applications ==&lt;br /&gt;
Applications currently using Jstacs:&lt;br /&gt;
* [[MotifAdjuster]]&lt;br /&gt;
* [[Dispom]]&lt;br /&gt;
* [[TALgetter]]&lt;br /&gt;
* [[TALENoffer]]&lt;br /&gt;
* [[Dimont]]&lt;br /&gt;
* [[GeMoMa]]&lt;br /&gt;
* [[AnnoTALE]]&lt;br /&gt;
* [[InMoDe]]&lt;br /&gt;
&lt;br /&gt;
== Bug reports &amp;amp; Feature requests ==&lt;br /&gt;
You can submit bug reports and feature requests by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de].&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Latest Papers ==&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PCTLearn | Algorithms for learning parsimonious context trees]]&#039;&#039;&#039;&#039;&#039; has been published in [https://link.springer.com/article/10.1007/s10994-018-5770-9 Machine Learning].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Disentangler | Disentangling transcription factor binding site complexity]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[GeMoMa | Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi]]&#039;&#039;&#039;&#039;&#039; has been published in [https://link.springer.com/article/10.1186%2Fs12859-018-2203-5 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[InMoDe | InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AnnoTALE | AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/articles/srep21077 Scientific Reports].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Slim | Varying levels of complexity in transcription factor binding motifs]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
Further papers and projects can be found under [[Projects]].&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=955</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=955"/>
		<updated>2018-08-07T16:39:13Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: source code&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Disentangling transcription factor binding site complexity]. &#039;&#039;Nucleic Acids Research&#039;&#039;, gky683, 2018; doi: 10.1093/nar/gky683&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later). &lt;br /&gt;
It is recommended to use the GUI for testing purposes or analysis of a single data set and to resort to the CLI for more elaborate applications (multiple data sets, use on a cluster, etc.).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; (IMD) and &#039;&#039;Motif complexity analysis&#039;&#039; (MCA). In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection (IMD) ===&lt;br /&gt;
IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts.&lt;br /&gt;
&lt;br /&gt;
The default value of intermixture threshold (0.19) is a robust choice; slight variations in the interval (0.15,0.3) have only little impact for the majority of cases (see paper).&lt;br /&gt;
&lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure in each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. While this option is included for experimental purposes; keeping the default for practical applications is strongly recommended, as otherwise adjusting the intermixture threshold might be needed.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis (MCA) ===&lt;br /&gt;
MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixtures of PWM models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
The tool itself requires, for each run, to choose a concrete model as input.&lt;br /&gt;
It returns a text-file containing the intra-motif complexity (IMC) measure of the data set under the given model.&lt;br /&gt;
For comparing different models according to IMC, the tool thus needs to be run multiple times.&lt;br /&gt;
&lt;br /&gt;
In addition, each run also outputs a visualization of the learned model and a storable (.xml) file that can be used as input to &amp;quot;Sequence scan&amp;quot; (see below).&lt;br /&gt;
&lt;br /&gt;
Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for data sets with sequence length greater than 20.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [https://www.cs.helsinki.fi/u/eggeling/Disentangler/Disentangler-sources.zip source code] requires Jstacs 2.3 and JstacsFX 1.0. For compiling instructions see the included README.txt file.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Projects&amp;diff=954</id>
		<title>Projects</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Projects&amp;diff=954"/>
		<updated>2018-08-02T07:09:35Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This site contains projects that use Jstacs.&lt;br /&gt;
* [[MotifAdjuster]]: a tool for computational reassessment of transcription factor binding site annotations&lt;br /&gt;
* [[Prior]]: apples and oranges: avoiding different priors in Bayesian DNA sequence analysis&lt;br /&gt;
* [[GenDisMix]]: unifying generative and discriminative learning principles&lt;br /&gt;
* [[Dispom]]: de-novo discovery of differentially abundant transcription factor binding sites including their positional preference&lt;br /&gt;
* [[MiMB]]: probabilistic approaches to transcription factor binding site prediction&lt;br /&gt;
* [[SHMM]]: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data&lt;br /&gt;
* [[DSHMM]]: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles&lt;br /&gt;
* [[PHHMM]]: improved analysis of Array-CGH data&lt;br /&gt;
* [[MeDIP-HMM]]: HMM-based analysis of DNA methylation profiles&lt;br /&gt;
* [[ARHMM]]: integrating local chromosomal dependencies into the analysis of tumor expression profiles&lt;br /&gt;
* [[FlowCap]]: molecular classification of acute myeloid leukaemia (AML) using flow cytometry data&lt;br /&gt;
* [[TALgetter]]: prediction of TAL effector target sites&lt;br /&gt;
* [[TALENoffer]]: genome-wide TALEN off-target prediction&lt;br /&gt;
* [[Dimont]]: general approach for discriminative de-novo motif discovery from high-throughput data&lt;br /&gt;
* [[AUC-PR]]: area under ROC and PR curves for weighted and unweighted data&lt;br /&gt;
* [[Slim]]: Sparse local inhomogeneous mixture (Slim) models and dependency logos&lt;br /&gt;
* [[PMMdeNovo]]: de novo motif discovery based on inhomogeneous parsimonious Markov models (PMMs) for exploiting intra-motif dependencies&lt;br /&gt;
* [[AnnoTALE]]: identifying and analysing TALEs in &#039;&#039;Xanthomonas&#039;&#039; genomes, for clustering TALEs, for assigning novel TALEs to existing classes, for proposing TALE names using a unified nomenclature, and for predicting TALE targets&lt;br /&gt;
*[[GeMoMa]]: Gene Model Mapper (GeMoMa) is a homology-based gene prediction program that uses the annotation of protein-coding genes in a reference genome to infer annotation of protein-coding genes in a target genome&lt;br /&gt;
* [[InMoDe]]: tools for learning and visualizing intra-motif dependencies of DNA binding sites&lt;br /&gt;
* [[Disentangler]]: two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Main_Page&amp;diff=953</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Main_Page&amp;diff=953"/>
		<updated>2018-08-02T07:06:30Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: /* Latest Papers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== A &amp;lt;font color=FireBrick&amp;gt;J&amp;lt;/font&amp;gt;ava framework for &amp;lt;font color=FireBrick&amp;gt;st&amp;lt;/font&amp;gt;atistical &amp;lt;font color=FireBrick&amp;gt;a&amp;lt;/font&amp;gt;nalysis and &amp;lt;font color=FireBrick&amp;gt;c&amp;lt;/font&amp;gt;lassification of biological &amp;lt;font color=FireBrick&amp;gt;s&amp;lt;/font&amp;gt;equences ==&lt;br /&gt;
&lt;br /&gt;
Sequence analysis is one of the major subjects of&lt;br /&gt;
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].&lt;br /&gt;
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as&lt;br /&gt;
alignment algorithms.&lt;br /&gt;
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an&lt;br /&gt;
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches&lt;br /&gt;
for parameter learning. Using Jstacs, classifiers can be assessed and&lt;br /&gt;
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented&lt;br /&gt;
design Jstacs is easy to use and readily extensible.&lt;br /&gt;
&lt;br /&gt;
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].&lt;br /&gt;
&lt;br /&gt;
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.&lt;br /&gt;
&lt;br /&gt;
== Licensing Information ==&lt;br /&gt;
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].&lt;br /&gt;
&lt;br /&gt;
== Current release ==&lt;br /&gt;
You can download Jstacs version 2.3 [[Downloads | here]].&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;You find an overview of the new features in the [[Version history]].&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.&lt;br /&gt;
&lt;br /&gt;
== Getting started &amp;amp; Cookbook==&lt;br /&gt;
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].&lt;br /&gt;
&lt;br /&gt;
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].&lt;br /&gt;
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.&lt;br /&gt;
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.&lt;br /&gt;
&lt;br /&gt;
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].&lt;br /&gt;
&lt;br /&gt;
== Publication ==&lt;br /&gt;
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.&lt;br /&gt;
If you use Jstacs in your research, please cite&lt;br /&gt;
&lt;br /&gt;
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. &#039;&#039;Jstacs: A java framework for statistical analysis and classification of biological sequences&#039;&#039;. Journal of Machine Learning Research, &#039;&#039;&#039;13&#039;&#039;&#039;(Jun):1967–1971, 2012.&lt;br /&gt;
&lt;br /&gt;
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]&lt;br /&gt;
&lt;br /&gt;
== JstacsFX ==&lt;br /&gt;
JstacsFX is a library for building applications with graphical user interface based on Jstacs classes and using JavaFX. JstacsFX builds upon the [http://www.jstacs.de/api/de/jstacs/tools/JstacsTool.html JstacsTool] interface that has also been used to create [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/cli/CLI.html command line] and [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/galaxy/Galaxy.html Galaxy] versions of tools with minimal effort. In addition it makes use of the [http://www.jstacs.de/api-2.3/de/jstacs/parameters/Parameter.html Parameter], [http://www.jstacs.de/api-2.3/de/jstacs/results/Result.html Result], and [http://www.jstacs.de/api-2.3/de/jstacs/results/savers/ResultSaver.html ResultSaver] classes of Jstacs.&lt;br /&gt;
&lt;br /&gt;
The current release of JstacsFX is available from [[Downloads]] and an [http://www.jstacs.de/api-fx/index.html API documentation] is available.&lt;br /&gt;
&lt;br /&gt;
Example applications using JstacsFX for their graphical user interface are [[InMoDe]] and [[AnnoTALE]].&lt;br /&gt;
&lt;br /&gt;
== Applications ==&lt;br /&gt;
Applications currently using Jstacs:&lt;br /&gt;
* [[MotifAdjuster]]&lt;br /&gt;
* [[Dispom]]&lt;br /&gt;
* [[TALgetter]]&lt;br /&gt;
* [[TALENoffer]]&lt;br /&gt;
* [[Dimont]]&lt;br /&gt;
* [[GeMoMa]]&lt;br /&gt;
* [[AnnoTALE]]&lt;br /&gt;
* [[InMoDe]]&lt;br /&gt;
&lt;br /&gt;
== Bug reports &amp;amp; Feature requests ==&lt;br /&gt;
You can submit bug reports and feature requests by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de].&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Latest Papers ==&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Disentangler | Disentangling transcription factor binding site complexity]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[GeMoMa | Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi]]&#039;&#039;&#039;&#039;&#039; has been published in [https://link.springer.com/article/10.1186%2Fs12859-018-2203-5 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[InMoDe | InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AnnoTALE | AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/articles/srep21077 Scientific Reports].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Slim | Varying levels of complexity in transcription factor binding motifs]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
Further papers and projects can be found under [[Projects]].&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=952</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=952"/>
		<updated>2018-08-02T07:04:04Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Disentangling transcription factor binding site complexity]. &#039;&#039;Nucleic Acids Research&#039;&#039;, gky683, 2018; doi: 10.1093/nar/gky683&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later). &lt;br /&gt;
It is recommended to use the GUI for testing purposes or analysis of a single data set and to resort to the CLI for more elaborate applications (multiple data sets, use on a cluster, etc.).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; (IMD) and &#039;&#039;Motif complexity analysis&#039;&#039; (MCA). In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection (IMD) ===&lt;br /&gt;
IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts.&lt;br /&gt;
&lt;br /&gt;
The default value of intermixture threshold (0.19) is a robust choice; slight variations in the interval (0.15,0.3) have only little impact for the majority of cases (see paper).&lt;br /&gt;
&lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure in each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. While this option is included for experimental purposes; keeping the default for practical applications is strongly recommended, as otherwise adjusting the intermixture threshold might be needed.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis (MCA) ===&lt;br /&gt;
MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixtures of PWM models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
The tool itself requires, for each run, to choose a concrete model as input.&lt;br /&gt;
It returns a text-file containing the intra-motif complexity (IMC) measure of the data set under the given model.&lt;br /&gt;
For comparing different models according to IMC, the tool thus needs to be run multiple times.&lt;br /&gt;
&lt;br /&gt;
In addition, each run also outputs a visualization of the learned model and a storable (.xml) file that can be used as input to &amp;quot;Sequence scan&amp;quot; (see below).&lt;br /&gt;
&lt;br /&gt;
Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for data sets with sequence length greater than 20.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the source code (to be released soon) requires Jstacs 2.3.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=951</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=951"/>
		<updated>2018-08-01T07:54:26Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. Disentangling transcription factor binding site complexity. &#039;&#039;Nucleic Acids Research&#039;&#039;, gky683, 2018; doi: 10.1093/nar/gky683 (to appear)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later). &lt;br /&gt;
It is recommended to use the GUI for testing purposes or analysis of a single data set and to resort to the CLI for more elaborate applications (multiple data sets, use on a cluster, etc.).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; (IMD) and &#039;&#039;Motif complexity analysis&#039;&#039; (MCA). In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection (IMD) ===&lt;br /&gt;
IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts.&lt;br /&gt;
&lt;br /&gt;
The default value of intermixture threshold (0.19) is a robust choice; slight variations in the interval (0.15,0.3) have only little impact for the majority of cases (see paper).&lt;br /&gt;
&lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure in each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. While this option is included for experimental purposes; keeping the default for practical applications is strongly recommended, as otherwise adjusting the intermixture threshold might be needed.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis (MCA) ===&lt;br /&gt;
MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixtures of PWM models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
The tool itself requires, for each run, to choose a concrete model as input.&lt;br /&gt;
It returns a text-file containing the intra-motif complexity (IMC) measure of the data set under the given model.&lt;br /&gt;
For comparing different models according to IMC, the tool thus needs to be run multiple times.&lt;br /&gt;
&lt;br /&gt;
In addition, each run also outputs a visualization of the learned model and a storable (.xml) file that can be used as input to &amp;quot;Sequence scan&amp;quot; (see below).&lt;br /&gt;
&lt;br /&gt;
Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for data sets with sequence length greater than 20.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the source code (to be released soon) requires Jstacs 2.3.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=950</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=950"/>
		<updated>2018-08-01T07:52:42Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. Disentangling transcription factor binding site complexity. &#039;&#039;Nucleic Acids Research&#039;&#039;, gky683, 2018; doi: 10.1093/nar/gky683 (to appear)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; (IMD) and &#039;&#039;Motif complexity analysis&#039;&#039; (MCA). In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection (IMD) ===&lt;br /&gt;
IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts.&lt;br /&gt;
&lt;br /&gt;
The default value of intermixture threshold (0.19) is a robust choice; slight variations in the interval (0.15,0.3) have only little impact for the majority of cases (see paper).&lt;br /&gt;
&lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure in each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. While this option is included for experimental purposes; keeping the default for practical applications is strongly recommended, as otherwise adjusting the intermixture threshold might be needed.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis (MCA) ===&lt;br /&gt;
MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixtures of PWM models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
The tool itself requires, for each run, to choose a concrete model as input.&lt;br /&gt;
It returns a text-file containing the intra-motif complexity (IMC) measure of the data set under the given model.&lt;br /&gt;
For comparing different models according to IMC, the tool thus needs to be run multiple times.&lt;br /&gt;
&lt;br /&gt;
In addition, each run also outputs a visualization of the learned model and a storable (.xml) file that can be used as input to &amp;quot;Sequence scan&amp;quot; (see below).&lt;br /&gt;
&lt;br /&gt;
Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for data sets with sequence length greater than 20.&lt;br /&gt;
&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the source code (to be released soon) requires Jstacs 2.3.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=949</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=949"/>
		<updated>2018-08-01T07:49:15Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. Disentangling transcription factor binding site complexity. &#039;&#039;Nucleic Acids Research&#039;&#039;, gky683, 2018; doi: 10.1093/nar/gky683 (to appear)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; (IMD) and &#039;&#039;Motif complexity analysis&#039;&#039; (MCA). In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection (IMD) ===&lt;br /&gt;
IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts.&lt;br /&gt;
The default value of intermixture threshold (0.19) is a robust choice; slight variations in the interval (0.15,0.3) have only little impact for the majority of cases (see paper).&lt;br /&gt;
&lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure in each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
Note: If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. While this option is included for experimental purposes; keeping the default for practical applications is strongly recommended, as otherwise adjusting the intermixture threshold might be needed.&lt;br /&gt;
The default values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; are fairly conservative.&lt;br /&gt;
Smaller values can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis (MCA) ===&lt;br /&gt;
MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixtures of PWM models, and variants in between.&lt;br /&gt;
The tool itself requires, for each run, to choose a concrete model as input.&lt;br /&gt;
It returns a text-file containing the intra-motif complexity (IMC) measure of the data set under the given model.&lt;br /&gt;
In addition, it outputs a visualization of the learned model and a storable (.xml) file that can be used as input to &amp;quot;Sequence scan&amp;quot; (see below).&lt;br /&gt;
For comparing different models according to IMC, the tool thus needs to be run multiple times.&lt;br /&gt;
&lt;br /&gt;
Note: Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for data sets with sequence length greater than 20.&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the source code (to be released soon) requires Jstacs 2.3.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=948</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=948"/>
		<updated>2018-07-30T15:19:03Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline. IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts. MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixture models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. Disentangling transcription factor binding site complexity. &#039;&#039;Nucleic Acids Research&#039;&#039;, gky683, 2018; doi: 10.1093/nar/gky683 (to appear)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerGUI.jar DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/Disentangler/DisentanglerCLI.jar DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; and &#039;&#039;Motif complexity analysis&#039;&#039;. In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection ===&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. Option included for experimental purposes, for practical use keeping the default is strongly recommended.&lt;br /&gt;
Smaller values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure at each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis ===&lt;br /&gt;
The tool allows to learn of proximal/distal dependency models and mixtures thereof. Note: Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for motifs of length greater than 20.&lt;br /&gt;
The tool returns a text-file containing the intra-motif complexity measure of the data set, a visualization of the learned model, and a storable (.xml) file that can be used as input to &#039;&#039;Sequence scan&#039;&#039;.&lt;br /&gt;
The mixture weights and model complexities of each component can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://www.cs.helsinki.fi/u/eggeling/Disentangler/data.tar.gz data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them suitable for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the source code (to be released soon) requires Jstacs 2.3.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Disentangler&amp;diff=947</id>
		<title>Disentangler</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Disentangler&amp;diff=947"/>
		<updated>2018-07-29T09:53:20Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: first content&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling.&lt;br /&gt;
&lt;br /&gt;
Disentangler comprises two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline. IMD can test whether putative complexity can be explained by intermixtures with binding sites from different TFs or other contamination and correct for such artifacts. MCA allows to select an optimal model of TFBS complexity, choosing among dependence models, mixture models, and variants in between.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use Disentangler, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling. [https://tba.org Disentangling transcription factor binding site complexity]. &#039;&#039;Nucleic Acids Research&#039;&#039;, 2018; doi: 10.1093/nar/gky683 (to appear)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
Disentangler offers two user interfaces. &lt;br /&gt;
* [https://tba.org DisentanglerGUI] -- graphical user interface&lt;br /&gt;
* [https://tba.org DisentanglerCLI] -- command line interface&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
== Functionality ==&lt;br /&gt;
The software contains the two subtools described in the paper, called &#039;&#039;Intermixture detection&#039;&#039; and &#039;&#039;Motif complexity analysis&#039;&#039;. In addition, there is a tool &#039;&#039;Sequence scan&#039;&#039; that can be used to search for motif hits within target sequences based on models that are returned by &#039;&#039;Motif complexity analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
All tools expect a set of aligned, gapless, TFBS of the same length as input. If the content of the input file starts with &#039;&amp;gt;&#039;, it is interpreted as FastA file. Otherwise it is interpreted as plain text, where every line contains a single sequence. &lt;br /&gt;
The input expects upper- and lower case letters of the standard DNA alphabet {A,C,G,T}. If other symbols from the IUPAC code (such as N) are encountered, they are replaced by a random sample from the distribution of {A,C,G,T} in the data set. &lt;br /&gt;
&lt;br /&gt;
=== Intermixture detection ===&lt;br /&gt;
If &amp;quot;JSD weights&amp;quot; is disabled, the intermixture measure is computed on a non-weighted Jenson-Shannon divergence. Option included for experimental purposes, for practical use keeping the default is strongly recommended.&lt;br /&gt;
Smaller values for &amp;quot;Restarts&amp;quot;, &amp;quot;Time limit&amp;quot; and &amp;quot;Termination threshold&amp;quot; can speed up every recursive step, which can be beneficial for testing purposes, but they may affect quality of the results. &lt;br /&gt;
The tool returns a text file with the intermixture number and all clusters produced by IMD as text files of the binding sites and sequence logos of the mononucleotide statistics.  &lt;br /&gt;
The values for the intermixture measure at each recursive step can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
=== Motif complexity analysis ===&lt;br /&gt;
The tool allows to learn of proximal/distal dependency models and mixtures thereof. Note: Learning distal dependence models of order greater than one can be very time- and memory consuming if the input sequences are long.&lt;br /&gt;
It is not recommended for motifs of length greater than 20.&lt;br /&gt;
The tool returns a text-file containing the intra-motif complexity measure of the data set, a visualization of the learned model, and a storable (.xml) file that can be used as input to &#039;&#039;Sequence scan&#039;&#039;.&lt;br /&gt;
The mixture weights and model complexities of each component can be found in the protocol.&lt;br /&gt;
&lt;br /&gt;
=== Sequence scan ===&lt;br /&gt;
This tool is a variant of the [http://www.jstacs.de/index.php/InMoDe InMoDe] ScanApp, with increased support for different types of models, that is, mixture models and distal dependence models.&lt;br /&gt;
&amp;quot;Input model&amp;quot; needs to be an model file (in .xml format) produced by ``Motif complexity analysis&#039;&#039;.&lt;br /&gt;
The &amp;quot;FPR&amp;quot; pertains here to the number of sequence that have at least one hit.&lt;br /&gt;
The tool returns a list with coordinates of motif hits as well as the extracted binding sites.&lt;br /&gt;
&lt;br /&gt;
== Example data ==&lt;br /&gt;
These [https://tba.org data sets] are discussed in the paper in detail (Section &amp;quot;Application examples&amp;quot;), which makes them ideal candidates for testing the functionality of Disentangler.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [https://tba.org source code] requires Jstacs 2.3.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=945</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=945"/>
		<updated>2018-07-20T15:50:15Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
InMoDe is continuously developed further.&lt;br /&gt;
Feel free to report bugs, make feature requests, or give other comments and suggestions to eggeling[at]cs.helsinki.fi.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI] -- command line interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy] -- for integration into own Galaxy&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.1.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.1.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== [http://galaxy.informatik.uni-halle.de Webserver] ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;br /&gt;
&lt;br /&gt;
== Version history ==&lt;br /&gt;
&lt;br /&gt;
=== Version 1.1 ===&lt;br /&gt;
&#039;&#039;Minor improvements ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/changelog-1.1.txt changelog]) for ISMB 2017 ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/poster.pdf poster])&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf InMoDe User Guide (version 1.1)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe1.1_sources.zip Source code]&lt;br /&gt;
&lt;br /&gt;
=== Version 1.0 ===&lt;br /&gt;
&#039;&#039;Initial release&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDe_userGuide-1.0.pdf InMoDe User Guide (version 1.0)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGUI-1.0.jar InMoDeGUI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeCLI-1.0.jar InMoDeCLI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGalaxy-1.0.jar InMoDeGalaxy-1.0.jar]&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=888</id>
		<title>PCTLearn</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=888"/>
		<updated>2017-12-14T12:07:30Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.&lt;br /&gt;
&lt;br /&gt;
== Description ==&lt;br /&gt;
&lt;br /&gt;
Parsimonious context trees, PCTs, provide a sparse parameterization of conditional probability distributions, but learning them from data is computationally hard due to the combinatorial explosion of the space of model structures as the number of predictor variables grows. Here, we propose new algorithmic ideas, which can ignificantly expedite the standard dynamic programming algorithm. Specifically, we introduce a memoization technique, which exploits regularities within the predictor variables by equating different contexts associated with the same data subset, and a bound-and-prune technique, which exploits regularities within the response variable by pruning parts of the search space based on score upper bounds.   &lt;br /&gt;
The software &#039;&#039;&#039;PCTLearn&#039;&#039;&#039; is a lightweight Java application that implements these ideas and can be used to learn a single PCT of user-specified maximal depth from data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download == &lt;br /&gt;
&lt;br /&gt;
The application is available as a single runnable .jar [https://www.cs.helsinki.fi/u/eggeling/PCTLearn/PCTLearn.jar PCTLearn].&lt;br /&gt;
&lt;br /&gt;
== Input ==&lt;br /&gt;
The application requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. &lt;br /&gt;
The number of different characters in the input file determines the alphabet size for PCT optimization.&lt;br /&gt;
Each line in the input file is chopped into overlapping k-mers, where k to the desired maximal PCT depth + 1. The PCT is learned on these resulting k-mers with the convention that the last symbol in each k-mer denotes the response variable.&lt;br /&gt;
&lt;br /&gt;
== Running PCTLearn ==&lt;br /&gt;
&lt;br /&gt;
The application has one mandatory and various optional arguments. &lt;br /&gt;
A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
Run with &lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input data. &amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;maximalDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The maximal depth of the learned PCT.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;scoringFunction&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;BIC&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used scoring function. Permitted values are &amp;quot;BIC&amp;quot; and &amp;quot;AIC&amp;quot;.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoization&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling memoization.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;pruning&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling pruning.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fineBound&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoLimit&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;lookaheadDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used lookahead depth. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt; &lt;br /&gt;
&lt;br /&gt;
== Output ==&lt;br /&gt;
The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout. &lt;br /&gt;
&lt;br /&gt;
It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=887</id>
		<title>PCTLearn</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=887"/>
		<updated>2017-12-14T12:06:39Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.&lt;br /&gt;
&lt;br /&gt;
== Description ==&lt;br /&gt;
&lt;br /&gt;
Parsimonious context trees, PCTs, provide a sparse parameterization of conditional probability distributions, but learning them from data is computationally hard due to the combinatorial explosion of the space of model structures as the number of predictor variables grows. Here, we propose new algorithmic ideas, which can ignificantly expedite the standard dynamic programming algorithm. Specifically, we introduce a memoization technique, which exploits regularities within the predictor variables by equating different contexts associated with the same data subset, and a bound-and-prune technique, which exploits regularities within the response variable by pruning parts of the search space based on score upper bounds.   &lt;br /&gt;
The software &#039;&#039;&#039;PCTLearn&#039;&#039;&#039; is a lightweight Java application that learns a single PCT of user-specified maximal depth from data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download == &lt;br /&gt;
&lt;br /&gt;
The application is available as a single runnable .jar [https://www.cs.helsinki.fi/u/eggeling/PCTLearn/PCTLearn.jar PCTLearn].&lt;br /&gt;
&lt;br /&gt;
== Input ==&lt;br /&gt;
The application requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. &lt;br /&gt;
The number of different characters in the input file determines the alphabet size for PCT optimization.&lt;br /&gt;
Each line in the input file is chopped into overlapping k-mers, where k to the desired maximal PCT depth + 1. The PCT is learned on these resulting k-mers with the convention that the last symbol in each k-mer denotes the response variable.&lt;br /&gt;
&lt;br /&gt;
== Running PCTLearn ==&lt;br /&gt;
&lt;br /&gt;
The application has one mandatory and various optional arguments. &lt;br /&gt;
A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
Run with &lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input data. &amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;maximalDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The maximal depth of the learned PCT.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;scoringFunction&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;BIC&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used scoring function. Permitted values are &amp;quot;BIC&amp;quot; and &amp;quot;AIC&amp;quot;.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoization&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling memoization.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;pruning&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling pruning.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fineBound&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoLimit&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;lookaheadDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used lookahead depth. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt; &lt;br /&gt;
&lt;br /&gt;
== Output ==&lt;br /&gt;
The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout. &lt;br /&gt;
&lt;br /&gt;
It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=886</id>
		<title>PCTLearn</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=886"/>
		<updated>2017-12-14T11:33:52Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.&lt;br /&gt;
&lt;br /&gt;
== Runnable JAR ==&lt;br /&gt;
[https://www.cs.helsinki.fi/u/eggeling/PCTLearn/PCTLearn.jar PCTLearn] requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. The number of different characters in the input file determines the alphabet size for PCT optimization.&lt;br /&gt;
The application has one mandatory and various optional arguments. &lt;br /&gt;
A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
Run with &lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input data. &amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;maximalDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The maximal depth of the learned PCT.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;scoringFunction&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;BIC&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used scoring function. Permitted values are &amp;quot;BIC&amp;quot; and &amp;quot;AIC&amp;quot;.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoization&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling memoization.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;pruning&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Enabling pruning.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fineBound&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Boolean&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;TRUE&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;memoLimit&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;lookaheadDepth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;1&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The used lookahead depth. Is ignored if pruning is set to FALSE.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt; &lt;br /&gt;
&lt;br /&gt;
The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout. &lt;br /&gt;
&lt;br /&gt;
It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=885</id>
		<title>PCTLearn</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PCTLearn&amp;diff=885"/>
		<updated>2017-12-13T09:17:52Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: page created&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=884</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=884"/>
		<updated>2017-11-26T10:42:10Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
InMoDe is continuously developed further.&lt;br /&gt;
Feel free to report bugs, make feature requests, or give other comments and suggestions to eggeling[at]cs.helsinki.fi.&lt;br /&gt;
If you wish to be notified about news regarding the software, such as releases of a next version, send an email with the header &amp;quot;InMoDe subscribe&amp;quot; to the aforementioned address.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI] -- command line interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy] -- for integration into own Galaxy&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.1.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.1.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== [http://galaxy.informatik.uni-halle.de Webserver] ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;br /&gt;
&lt;br /&gt;
== Version history ==&lt;br /&gt;
&lt;br /&gt;
=== Version 1.1 ===&lt;br /&gt;
&#039;&#039;Minor improvements ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/changelog-1.1.txt changelog]) for ISMB 2017 ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/poster.pdf poster])&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf InMoDe User Guide (version 1.1)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe1.1_sources.zip Source code]&lt;br /&gt;
&lt;br /&gt;
=== Version 1.0 ===&lt;br /&gt;
&#039;&#039;Initial release&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDe_userGuide-1.0.pdf InMoDe User Guide (version 1.0)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGUI-1.0.jar InMoDeGUI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeCLI-1.0.jar InMoDeCLI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGalaxy-1.0.jar InMoDeGalaxy-1.0.jar]&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=883</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=883"/>
		<updated>2017-11-26T10:35:10Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
InMoDe is continuously developed further.&lt;br /&gt;
Feel free to report bugs reports, make feature requests, or other comments and suggestions to eggeling[at]cs.helsinki.fi.&lt;br /&gt;
If you wish to be notified about news regarding the software, such as releases of a next version, send an email with the header &amp;quot;InMoDe subscribe&amp;quot; to the aforementioned address.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI] -- command line interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy] -- for integration into own Galaxy&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.1.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.1.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== [http://galaxy.informatik.uni-halle.de Webserver] ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;br /&gt;
&lt;br /&gt;
== Version history ==&lt;br /&gt;
&lt;br /&gt;
=== Version 1.1 ===&lt;br /&gt;
&#039;&#039;Minor improvements ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/changelog-1.1.txt changelog]) for ISMB 2017 ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/poster.pdf poster])&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf InMoDe User Guide (version 1.1)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe1.1_sources.zip Source code]&lt;br /&gt;
&lt;br /&gt;
=== Version 1.0 ===&lt;br /&gt;
&#039;&#039;Initial release&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDe_userGuide-1.0.pdf InMoDe User Guide (version 1.0)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGUI-1.0.jar InMoDeGUI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeCLI-1.0.jar InMoDeCLI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGalaxy-1.0.jar InMoDeGalaxy-1.0.jar]&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=881</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=881"/>
		<updated>2017-09-16T19:04:40Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: version updates&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI] -- graphical user interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI] -- command line interface&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy] -- for integration into own Galaxy&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.1.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.1.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;br /&gt;
&lt;br /&gt;
== Version history ==&lt;br /&gt;
&lt;br /&gt;
=== Version 1.1 ===&lt;br /&gt;
&#039;&#039;Minor improvements ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/changelog-1.1.txt changelog]) for ISMB 2017 ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/poster.pdf poster])&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf InMoDe User Guide (version 1.1)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe1.1_sources.zip Source code]&lt;br /&gt;
&lt;br /&gt;
=== Version 1.0 ===&lt;br /&gt;
&#039;&#039;Initial release&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDe_userGuide-1.0.pdf InMoDe User Guide (version 1.0)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGUI-1.0.jar InMoDeGUI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeCLI-1.0.jar InMoDeCLI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGalaxy-1.0.jar InMoDeGalaxy-1.0.jar]&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=880</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=880"/>
		<updated>2017-09-15T16:04:40Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: source code&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI.jar] -- graphical user interface (version 1.1)&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI.jar] -- command line interface  (version 1.1)&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance (version 1.1)&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe (version 1.0), namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe (version 1.0) is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;br /&gt;
&lt;br /&gt;
== Version history ==&lt;br /&gt;
&lt;br /&gt;
=== Version 1.1 ===&lt;br /&gt;
&#039;&#039;Minor improvements ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/changelog-1.1.txt changelog]) for ISMB 2017 ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/poster.pdf poster])&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf InMoDe User Guide (version 1.1)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe1.1_sources.zip Source code]&lt;br /&gt;
&lt;br /&gt;
=== Version 1.0 ===&lt;br /&gt;
&#039;&#039;Initial release&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDe_userGuide-1.0.pdf InMoDe User Guide (version 1.0)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGUI-1.0.jar InMoDeGUI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeCLI-1.0.jar InMoDeCLI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGalaxy-1.0.jar InMoDeGalaxy-1.0.jar]&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=865</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=865"/>
		<updated>2017-07-31T09:49:15Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
The paper &#039;&#039;&#039;Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The software on this site is mainly intended to enable reproducibility of the results from the publication.&lt;br /&gt;
For other purposes, please consider using the more recent [http://jstacs.de/index.php/InMoDe InMoDe] software, which contains the methodology of PMMdeNovo as well as more advanced features, speedups, better user interfaces, and automatic visualization.&lt;br /&gt;
&lt;br /&gt;
== Description ==&lt;br /&gt;
&lt;br /&gt;
=== Background ===&lt;br /&gt;
Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery.&lt;br /&gt;
=== Results ===&lt;br /&gt;
To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice.&lt;br /&gt;
Conclusions&lt;br /&gt;
=== Conclusions ===&lt;br /&gt;
The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies.&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. &lt;br /&gt;
Default values can be used by assigning &amp;quot;def&amp;quot;. Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar ModelTrainer.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification by using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. This tool can be used for performing a single step of a K-fold cross validation experiment.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;  &lt;br /&gt;
The tool returns (i) the model complexity, i.e., the number of leaves of all parsimonious context trees of the learned motif model, and (ii) performance of the classifier measured by the area under the ROC curve.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=863</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=863"/>
		<updated>2017-07-22T12:38:22Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI.jar] -- graphical user interface (version 1.1)&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI.jar] -- command line interface  (version 1.1)&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance (version 1.1)&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe (version 1.0), namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe (version 1.0) is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;br /&gt;
&lt;br /&gt;
== Version history ==&lt;br /&gt;
&lt;br /&gt;
=== Version 1.1 ===&lt;br /&gt;
&#039;&#039;Minor improvements ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/changelog-1.1.txt changelog]) for ISMB 2017 ([https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/poster.pdf poster])&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf InMoDe User Guide (version 1.1)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI-1.1.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy-1.1.jar]&lt;br /&gt;
&lt;br /&gt;
=== Version 1.0 ===&lt;br /&gt;
&#039;&#039;Initial release&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDe_userGuide-1.0.pdf InMoDe User Guide (version 1.0)]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGUI-1.0.jar InMoDeGUI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeCLI-1.0.jar InMoDeCLI-1.0.jar]&lt;br /&gt;
* [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.0/InMoDeGalaxy-1.0.jar InMoDeGalaxy-1.0.jar]&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=862</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=862"/>
		<updated>2017-07-22T12:17:30Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI-1.1.jar InMoDeGUI.jar] -- graphical user interface (version 1.1)&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI-1.1.jar InMoDeCLI.jar] -- command line interface  (version 1.1)&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy-1.1.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance (version 1.1)&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe (version 1.0), namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe (version 1.0) is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=861</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=861"/>
		<updated>2017-07-22T12:16:59Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: version 1.1 added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [https://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDe_userGuide-1.1.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface (version 1.1)&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface  (version 1.1)&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/1.1/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance (version 1.1)&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe (version 1.0), namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe (version 1.0) is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Main_Page&amp;diff=830</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Main_Page&amp;diff=830"/>
		<updated>2017-02-14T09:18:52Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== A &amp;lt;font color=FireBrick&amp;gt;J&amp;lt;/font&amp;gt;ava framework for &amp;lt;font color=FireBrick&amp;gt;st&amp;lt;/font&amp;gt;atistical &amp;lt;font color=FireBrick&amp;gt;a&amp;lt;/font&amp;gt;nalysis and &amp;lt;font color=FireBrick&amp;gt;c&amp;lt;/font&amp;gt;lassification of biological &amp;lt;font color=FireBrick&amp;gt;s&amp;lt;/font&amp;gt;equences ==&lt;br /&gt;
&lt;br /&gt;
Sequence analysis is one of the major subjects of&lt;br /&gt;
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].&lt;br /&gt;
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as&lt;br /&gt;
alignment algorithms.&lt;br /&gt;
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an&lt;br /&gt;
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches&lt;br /&gt;
for parameter learning. Using Jstacs, classifiers can be assessed and&lt;br /&gt;
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented&lt;br /&gt;
design Jstacs is easy to use and readily extensible.&lt;br /&gt;
&lt;br /&gt;
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].&lt;br /&gt;
&lt;br /&gt;
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.&lt;br /&gt;
&lt;br /&gt;
== Licensing Information ==&lt;br /&gt;
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].&lt;br /&gt;
&lt;br /&gt;
== Current release ==&lt;br /&gt;
You can download Jstacs version 2.2 [[Downloads | here]].&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;You find an overview of the new features in the [[Version history]].&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.&lt;br /&gt;
&lt;br /&gt;
== Getting started &amp;amp; Cookbook==&lt;br /&gt;
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].&lt;br /&gt;
&lt;br /&gt;
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].&lt;br /&gt;
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.&lt;br /&gt;
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.&lt;br /&gt;
&lt;br /&gt;
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].&lt;br /&gt;
&lt;br /&gt;
== Publication ==&lt;br /&gt;
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.&lt;br /&gt;
If you use Jstacs in your research, please cite&lt;br /&gt;
&lt;br /&gt;
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. &#039;&#039;Jstacs: A java framework for statistical analysis and classification of biological sequences&#039;&#039;. Journal of Machine Learning Research, &#039;&#039;&#039;13&#039;&#039;&#039;(Jun):1967–1971, 2012.&lt;br /&gt;
&lt;br /&gt;
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]&lt;br /&gt;
== Applications ==&lt;br /&gt;
Applications currently using Jstacs:&lt;br /&gt;
* [[MotifAdjuster]]&lt;br /&gt;
* [[Dispom]]&lt;br /&gt;
* [[TALgetter]]&lt;br /&gt;
* [[TALENoffer]]&lt;br /&gt;
* [[Dimont]]&lt;br /&gt;
* [[GeMoMa]]&lt;br /&gt;
* [[AnnoTALE]]&lt;br /&gt;
&lt;br /&gt;
== Bug reports &amp;amp; Feature requests ==&lt;br /&gt;
You can submit bug reports and feature requests by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de].&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Latest Papers ==&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[InMoDe | InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]]&#039;&#039;&#039;&#039;&#039; has been published in [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AnnoTALE | AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/articles/srep21077 Scientific Reports].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[GeMoMa | Using intron position conservation for homology-based gene predictions]]&#039;&#039;&#039;&#039;&#039; has been published in [https://nar.oxfordjournals.org/content/early/2016/02/17/nar.gkw092 Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Slim | Varying levels of complexity in transcription factor binding motifs]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
Further papers and projects can be found under [[Projects]].&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=829</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=829"/>
		<updated>2017-02-14T09:16:59Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2017; 33(4): 580-582. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=828</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=828"/>
		<updated>2017-02-01T11:22:33Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/2938076/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2016. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Projects&amp;diff=815</id>
		<title>Projects</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Projects&amp;diff=815"/>
		<updated>2017-01-03T15:38:21Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This site contains projects that use Jstacs.&lt;br /&gt;
* [[MotifAdjuster]]: a tool for computational reassessment of transcription factor binding site annotations&lt;br /&gt;
* [[Prior]]: apples and oranges: avoiding different priors in Bayesian DNA sequence analysis&lt;br /&gt;
* [[GenDisMix]]: unifying generative and discriminative learning principles&lt;br /&gt;
* [[Dispom]]: de-novo discovery of differentially abundant transcription factor binding sites including their positional preference&lt;br /&gt;
* [[MiMB]]: probabilistic approaches to transcription factor binding site prediction&lt;br /&gt;
* [[SHMM]]: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data&lt;br /&gt;
* [[DSHMM]]: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles&lt;br /&gt;
* [[PHHMM]]: improved analysis of Array-CGH data&lt;br /&gt;
* [[MeDIP-HMM]]: HMM-based analysis of DNA methylation profiles&lt;br /&gt;
* [[ARHMM]]: integrating local chromosomal dependencies into the analysis of tumor expression profiles&lt;br /&gt;
* [[FlowCap]]: molecular classification of acute myeloid leukaemia (AML) using flow cytometry data&lt;br /&gt;
* [[TALgetter]]: prediction of TAL effector target sites&lt;br /&gt;
* [[TALENoffer]]: genome-wide TALEN off-target prediction&lt;br /&gt;
* [[Dimont]]: general approach for discriminative de-novo motif discovery from high-throughput data&lt;br /&gt;
* [[AUC-PR]]: area under ROC and PR curves for weighted and unweighted data&lt;br /&gt;
* [[Slim]]: Sparse local inhomogeneous mixture (Slim) models and dependency logos&lt;br /&gt;
* [[PMMdeNovo]]: de novo motif discovery based on inhomogeneous parsimonious Markov models (PMMs) for exploiting intra-motif dependencies&lt;br /&gt;
* [[AnnoTALE]]: identifying and analysing TALEs in &#039;&#039;Xanthomonas&#039;&#039; genomes, for clustering TALEs, for assigning novel TALEs to existing classes, for proposing TALE names using a unified nomenclature, and for predicting TALE targets&lt;br /&gt;
*[[GeMoMa]]: Gene Model Mapper (GeMoMa) is a homology-based gene prediction program that uses the annotation of protein-coding genes in a reference genome to infer annotation of protein-coding genes in a target genome&lt;br /&gt;
* [[InMoDe]]: tools for learning and visualizing intra-motif dependencies of DNA binding sites&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Main_Page&amp;diff=814</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Main_Page&amp;diff=814"/>
		<updated>2017-01-03T15:37:21Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== A &amp;lt;font color=FireBrick&amp;gt;J&amp;lt;/font&amp;gt;ava framework for &amp;lt;font color=FireBrick&amp;gt;st&amp;lt;/font&amp;gt;atistical &amp;lt;font color=FireBrick&amp;gt;a&amp;lt;/font&amp;gt;nalysis and &amp;lt;font color=FireBrick&amp;gt;c&amp;lt;/font&amp;gt;lassification of biological &amp;lt;font color=FireBrick&amp;gt;s&amp;lt;/font&amp;gt;equences ==&lt;br /&gt;
&lt;br /&gt;
Sequence analysis is one of the major subjects of&lt;br /&gt;
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].&lt;br /&gt;
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as&lt;br /&gt;
alignment algorithms.&lt;br /&gt;
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an&lt;br /&gt;
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches&lt;br /&gt;
for parameter learning. Using Jstacs, classifiers can be assessed and&lt;br /&gt;
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented&lt;br /&gt;
design Jstacs is easy to use and readily extensible.&lt;br /&gt;
&lt;br /&gt;
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].&lt;br /&gt;
&lt;br /&gt;
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.&lt;br /&gt;
&lt;br /&gt;
== Licensing Information ==&lt;br /&gt;
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].&lt;br /&gt;
&lt;br /&gt;
== Current release ==&lt;br /&gt;
You can download Jstacs version 2.2 [[Downloads | here]].&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;You find an overview of the new features in the [[Version history]].&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.&lt;br /&gt;
&lt;br /&gt;
== Getting started &amp;amp; Cookbook==&lt;br /&gt;
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].&lt;br /&gt;
&lt;br /&gt;
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].&lt;br /&gt;
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.&lt;br /&gt;
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.&lt;br /&gt;
&lt;br /&gt;
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].&lt;br /&gt;
&lt;br /&gt;
== Publication ==&lt;br /&gt;
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.&lt;br /&gt;
If you use Jstacs in your research, please cite&lt;br /&gt;
&lt;br /&gt;
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. &#039;&#039;Jstacs: A java framework for statistical analysis and classification of biological sequences&#039;&#039;. Journal of Machine Learning Research, &#039;&#039;&#039;13&#039;&#039;&#039;(Jun):1967–1971, 2012.&lt;br /&gt;
&lt;br /&gt;
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]&lt;br /&gt;
== Applications ==&lt;br /&gt;
Applications currently using Jstacs:&lt;br /&gt;
* [[MotifAdjuster]]&lt;br /&gt;
* [[Dispom]]&lt;br /&gt;
* [[TALgetter]]&lt;br /&gt;
* [[TALENoffer]]&lt;br /&gt;
* [[Dimont]]&lt;br /&gt;
* [[GeMoMa]]&lt;br /&gt;
* [[AnnoTALE]]&lt;br /&gt;
&lt;br /&gt;
== Bug reports &amp;amp; Feature requests ==&lt;br /&gt;
You can submit bug reports and feature requests by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de].&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Latest Papers ==&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[InMoDe | InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]]&#039;&#039;&#039;&#039;&#039; has been published in [http://bioinformatics.oxfordjournals.org/content/early/2016/12/28/bioinformatics.btw689 Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AnnoTALE | AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/articles/srep21077 Scientific Reports].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[GeMoMa | Using intron position conservation for homology-based gene predictions]]&#039;&#039;&#039;&#039;&#039; has been published in [https://nar.oxfordjournals.org/content/early/2016/02/17/nar.gkw092 Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Slim | Varying levels of complexity in transcription factor binding motifs]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
Further papers and projects can be found under [[Projects]].&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=813</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=813"/>
		<updated>2017-01-01T12:40:08Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [http://bioinformatics.oxfordjournals.org/content/early/2016/12/28/bioinformatics.btw689.full InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2016. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=811</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=811"/>
		<updated>2016-12-13T15:51:12Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau. [https://academic.oup.com/bioinformatics/article/2666342/InMoDe-tools-for-learning-and-visualizing-intra InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]. &#039;&#039;Bioinformatics&#039;&#039;, 2016. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=810</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=810"/>
		<updated>2016-12-13T15:37:51Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau; InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites. &#039;&#039;Bioinformatics&#039;&#039;, 2016, btw689. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDe/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=809</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=809"/>
		<updated>2016-12-12T09:00:54Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
&lt;br /&gt;
If you use InMoDe, please cite&lt;br /&gt;
&lt;br /&gt;
R. Eggeling, I. Grosse, and J. Grau; InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites. &#039;&#039;Bioinformatics&#039;&#039;, 2016, btw689. doi: 10.1093/bioinformatics/btw689&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;br /&gt;
&lt;br /&gt;
== Webserver ==&lt;br /&gt;
&lt;br /&gt;
A server with all tools of InMoDe is available for public use at [http://galaxy.informatik.uni-halle.de].&lt;br /&gt;
The provided web-server puts a certain limit on the complexity of runnable jobs for the learning tools.&lt;br /&gt;
For unlimited use, please download InMoDe and install it to your local machine or own Galaxy instance.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=807</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=807"/>
		<updated>2016-12-08T14:56:17Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: links updated&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse.&lt;br /&gt;
== Description ==&lt;br /&gt;
=== Background ===&lt;br /&gt;
Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery.&lt;br /&gt;
=== Results ===&lt;br /&gt;
To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice.&lt;br /&gt;
Conclusions&lt;br /&gt;
=== Conclusions ===&lt;br /&gt;
The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
The paper &#039;&#039;&#039;Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. &lt;br /&gt;
Default values can be used by assigning &amp;quot;def&amp;quot;. Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar ModelTrainer.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification by using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. This tool can be used for performing a single step of a K-fold cross validation experiment.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;  &lt;br /&gt;
The tool returns (i) the model complexity, i.e., the number of leaves of all parsimonious context trees of the learned motif model, and (ii) performance of the classifier measured by the area under the ROC curve.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [https://www.cs.helsinki.fi/u/eggeling/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=793</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=793"/>
		<updated>2016-08-23T15:51:54Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: coauthor added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling, Ivo Grosse, and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=792</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=792"/>
		<updated>2016-08-02T16:47:55Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: windows link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDe-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=791</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=791"/>
		<updated>2016-08-01T16:49:27Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: authors&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
by Ralf Eggeling and Jan Grau.&lt;br /&gt;
&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDeGUI-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=InMoDe&amp;diff=790</id>
		<title>InMoDe</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=InMoDe&amp;diff=790"/>
		<updated>2016-08-01T16:47:52Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: main content&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:InMoDe-test.png|100px|left]]&lt;br /&gt;
InMoDe is a collection of seven tools for learning, leveraging, and visualizing &#039;&#039;&#039;in&#039;&#039;&#039;tra-&#039;&#039;&#039;mo&#039;&#039;&#039;tif &#039;&#039;&#039;de&#039;&#039;&#039;pendencies within DNA binding sites and similar functional nucleotide sequences.&lt;br /&gt;
&lt;br /&gt;
For a detailed description of the functionality of InMoDe see the [http://www.cs.helsinki.fi/u/eggeling/InMoDe_userGuide.pdf user guide].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Download and installation ==&lt;br /&gt;
&lt;br /&gt;
InMoDe offers three user interfaces. &lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGUI.jar InMoDeGUI.jar] -- graphical user interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeCLI.jar InMoDeCLI.jar] -- command line interface&lt;br /&gt;
* [http://www.cs.helsinki.fi/u/eggeling/InMoDeGalaxy.jar InMoDeGalaxy.jar] -- for integration into own Galaxy instance&lt;br /&gt;
&lt;br /&gt;
that can be started by&lt;br /&gt;
&lt;br /&gt;
 java -jar filename.jar&lt;br /&gt;
&lt;br /&gt;
and require an existent Java installation (8u74 or later).&lt;br /&gt;
&lt;br /&gt;
In addition, there are two user-friendly alternatives for installing the GUI variant of InMoDe, namely (i) a [http://www.jstacs.de/downloads/InMoDe-1.0.dmg DMG for installation under Mac OS X], and (ii) a [http://www.jstacs.de/downloads/InMoDeGUI-1.0.exe Windows installer]. &lt;br /&gt;
&lt;br /&gt;
Both do not require a recent Java, as they automatically install the required libraries to the local machine.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=File:InMoDe.png&amp;diff=788</id>
		<title>File:InMoDe.png</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=File:InMoDe.png&amp;diff=788"/>
		<updated>2016-08-01T15:57:06Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: uploaded a new version of &amp;amp;quot;File:InMoDe.png&amp;amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;InMoDe logo&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=File:InMoDe.png&amp;diff=787</id>
		<title>File:InMoDe.png</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=File:InMoDe.png&amp;diff=787"/>
		<updated>2016-08-01T15:52:14Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: InMoDe logo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;InMoDe logo&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=741</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=741"/>
		<updated>2015-11-10T11:14:18Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse.&lt;br /&gt;
== Description ==&lt;br /&gt;
=== Background ===&lt;br /&gt;
Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery.&lt;br /&gt;
=== Results ===&lt;br /&gt;
To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice.&lt;br /&gt;
Conclusions&lt;br /&gt;
=== Conclusions ===&lt;br /&gt;
The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies.&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
The paper &#039;&#039;&#039;Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. &lt;br /&gt;
Default values can be used by assigning &amp;quot;def&amp;quot;. Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar ModelTrainer.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification by using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. This tool can be used for performing a single step of a K-fold cross validation experiment.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
The tool returns (i) the model complexity, i.e., the number of leaves of all parsimonious context trees of the learned motif model, and (ii) performance of the classifier measured by the area under the ROC curve.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Main_Page&amp;diff=740</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Main_Page&amp;diff=740"/>
		<updated>2015-11-10T11:07:38Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== A &amp;lt;font color=FireBrick&amp;gt;J&amp;lt;/font&amp;gt;ava framework for &amp;lt;font color=FireBrick&amp;gt;st&amp;lt;/font&amp;gt;atistical &amp;lt;font color=FireBrick&amp;gt;a&amp;lt;/font&amp;gt;nalysis and &amp;lt;font color=FireBrick&amp;gt;c&amp;lt;/font&amp;gt;lassification of biological &amp;lt;font color=FireBrick&amp;gt;s&amp;lt;/font&amp;gt;equences ==&lt;br /&gt;
&lt;br /&gt;
Sequence analysis is one of the major subjects of&lt;br /&gt;
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].&lt;br /&gt;
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as&lt;br /&gt;
alignment algorithms.&lt;br /&gt;
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an&lt;br /&gt;
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches&lt;br /&gt;
for parameter learning. Using Jstacs, classifiers can be assessed and&lt;br /&gt;
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented&lt;br /&gt;
design Jstacs is easy to use and readily extensible.&lt;br /&gt;
&lt;br /&gt;
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].&lt;br /&gt;
&lt;br /&gt;
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.&lt;br /&gt;
&lt;br /&gt;
== Licensing Information ==&lt;br /&gt;
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].&lt;br /&gt;
&lt;br /&gt;
== Current release ==&lt;br /&gt;
You can download Jstacs version 2.1 [[Downloads | here]].&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;You find an overview of the new features in the [[Recent changes]].&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.&lt;br /&gt;
&lt;br /&gt;
== Getting started &amp;amp; Cookbook==&lt;br /&gt;
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].&lt;br /&gt;
&lt;br /&gt;
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].&lt;br /&gt;
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.&lt;br /&gt;
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.&lt;br /&gt;
&lt;br /&gt;
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].&lt;br /&gt;
&lt;br /&gt;
== Publication ==&lt;br /&gt;
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.&lt;br /&gt;
If you use Jstacs in your research, please cite&lt;br /&gt;
&lt;br /&gt;
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. &#039;&#039;Jstacs: A java framework for statistical analysis and classification of biological sequences&#039;&#039;. Journal of Machine Learning Research, &#039;&#039;&#039;13&#039;&#039;&#039;(Jun):1967–1971, 2012.&lt;br /&gt;
&lt;br /&gt;
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]&lt;br /&gt;
== Applications ==&lt;br /&gt;
Applications currently using Jstacs:&lt;br /&gt;
* [[MotifAdjuster]]&lt;br /&gt;
* [[Dispom]]&lt;br /&gt;
* [[TALgetter]]&lt;br /&gt;
* [[TALENoffer]]&lt;br /&gt;
* [[Dimont]]&lt;br /&gt;
&lt;br /&gt;
== Bug reports &amp;amp; Feature requests ==&lt;br /&gt;
You can submit [https://trac.informatik.uni-halle.de/trac/jstacs/newticket bug reports and feature requests] via the Jstacs trac or by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de]. &#039;&#039;Before&#039;&#039; you open a new bug ticket, please check if that bug has already been submitted in the [https://trac.informatik.uni-halle.de/trac/jstacs/report list of existing tickets].&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Latest Papers ==&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Slim | Varying levels of complexity in transcription factor binding motifs]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]&#039;&#039;&#039;&#039;&#039; has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[TALENoffer | TALENoffer: genome-wide TALEN off-target prediction]]&#039;&#039;&#039;&#039;&#039; has been published in [http://bioinformatics.oxfordjournals.org/content/early/2013/08/30/bioinformatics.btt501 Bioinformatics].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[TALgetter | Computational predictions provide insights into the biology of TAL effector target sites]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002962 PLOS Computational Biology].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;Evaluation of methods for modeling transcription factor sequence specificity&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/nbt/journal/v31/n2/full/nbt.2486.html Nature Biotechnology].&lt;br /&gt;
&lt;br /&gt;
The paper &#039;&#039;&#039;&#039;&#039;[[FlowCap | Critical assessment of automated flow cytometry data analysis techniques]]&#039;&#039;&#039;&#039;&#039; has been published in [http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2365.html Nature Methods].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Further papers and projects can be found under [[Projects]].&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=Projects&amp;diff=739</id>
		<title>Projects</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=Projects&amp;diff=739"/>
		<updated>2015-11-10T11:02:54Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This site contains projects that use Jstacs.&lt;br /&gt;
* [[MotifAdjuster]]: a tool for computational reassessment of transcription factor binding site annotations&lt;br /&gt;
* [[Prior]]: apples and oranges: avoiding different priors in Bayesian DNA sequence analysis&lt;br /&gt;
* [[GenDisMix]]: unifying generative and discriminative learning principles&lt;br /&gt;
* [[Dispom]]: de-novo discovery of differentially abundant transcription factor binding sites including their positional preference&lt;br /&gt;
* [[MiMB]]: probabilistic approaches to transcription factor binding site prediction&lt;br /&gt;
* [[SHMM]]: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data&lt;br /&gt;
* [[DSHMM]]: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles&lt;br /&gt;
* [[PHHMM]]: improved analysis of Array-CGH data&lt;br /&gt;
* [[MeDIP-HMM]]: HMM-based analysis of DNA methylation profiles&lt;br /&gt;
* [[ARHMM]]: integrating local chromosomal dependencies into the analysis of tumor expression profiles&lt;br /&gt;
* [[FlowCap]]: molecular classification of acute myeloid leukaemia (AML) using flow cytometry data&lt;br /&gt;
* [[TALgetter]]: prediction of TAL effector target sites&lt;br /&gt;
* [[TALENoffer]]: genome-wide TALEN off-target prediction&lt;br /&gt;
* [[Dimont]]: general approach for discriminative de-novo motif discovery from high-throughput data&lt;br /&gt;
* [[AUC-PR]]: area under ROC and PR curves for weighted and unweighted data&lt;br /&gt;
* [[Slim]]: Sparse local inhomogeneous mixture (Slim) models and dependency logos&lt;br /&gt;
* [[PMMdeNovo]]: de novo motif discovery based on inhomogeneous parsimonious Markov models (PMMs) for exploiting intra-motif dependencies&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=738</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=738"/>
		<updated>2015-11-09T17:05:17Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: paper published&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
&lt;br /&gt;
== Paper ==&lt;br /&gt;
The paper [http://www.biomedcentral.com/1471-2105/16/375 Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data] by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse has been published in BMC Bioinformatics.&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. &lt;br /&gt;
Default values can be used by assigning &amp;quot;def&amp;quot;. Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar ModelTrainer.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification by using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. This tool can be used for performing a single step of a K-fold cross validation experiment.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
The tool returns (i) the model complexity, i.e., the number of leaves of all parsimonious context trees of the learned motif model, and (ii) performance of the classifier measured by the area under the ROC curve.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=703</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=703"/>
		<updated>2015-02-23T10:16:11Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. &lt;br /&gt;
Default values can be used by assigning &amp;quot;def&amp;quot;. Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar InhPMM.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification by using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. This tool can be used for performing a single step of a K-fold cross validation experiment.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
The tool returns (i) the model complexity, i.e., the number of leaves of all parsimonious context trees of the learned motif model, and (ii) performance of the classifier measured by the area under the ROC curve.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=702</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=702"/>
		<updated>2015-02-21T13:28:14Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools. All tools have mandatory (no default values) and optional arguments. &lt;br /&gt;
Default values can be used by assigning &amp;quot;def&amp;quot;. Alternatively, a shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar InhPMM.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. &lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar Classification.jar filePosTrain fileNegTrain filePosTest fileNegTest motifWidth motifOrder flankingOrder initSteps addSteps restarts&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTrain&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative training sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;filePosTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the positive test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;fileNegTest&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the negative test sequences (fasta or plain text).&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
The tool returns the classification results to the standard output.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
	<entry>
		<id>https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=701</id>
		<title>PMMdeNovo</title>
		<link rel="alternate" type="text/html" href="https://www.jstacs.de/index.php?title=PMMdeNovo&amp;diff=701"/>
		<updated>2015-02-21T13:05:54Z</updated>

		<summary type="html">&lt;p&gt;Eggeling: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
by Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse&lt;br /&gt;
&lt;br /&gt;
== Runnable JARs ==&lt;br /&gt;
The application consists of three independent tools.&lt;br /&gt;
&lt;br /&gt;
=== ModelTrainer ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/ModelTrainer.jar ModelTrainer] performs a de novo motif discovery on a set of putative non aligned sequences. It infers an inhomogenous PMM of arbitrary order, where order 0 corresponds to a PWM model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar InhPMM.jar inputFile motifWidth motifOrder flankingOrder initSteps addSteps restarts output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;inputFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of a text file containing the input sequences. If the first character in the file is &#039;&amp;gt;&#039; the content is interpreted interpreted as fasta file. Otherwise it is interpreted as plain text, i.e., each line corresponding to one sequence.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifWidth&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;20&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The width of the motif to be inferred.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;motifOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The initial order of the inhomogeneous PMM, i.e., the number of context positions that can be taken into account for modeling intra-motif dependencies.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;flankingOrder&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The order of the homogenous Markov model, which is used for modeling the flanking sequences that do not belong to the motif.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;initSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;50&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of initial iterations steps that the algorithm is always run for each restart.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;addSteps&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of additional iterations steps, i.e., the number of iterations that have to be performed after having obtained the last optimal model structure before termination is allowed.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;restarts&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;10&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The number of restarts of the algorithm.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;model&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The path and file prefix for the output files. The tool produces two files, namely (i) output.xml containing the learned model and (ii) output.dot containing the graphViz representation of the learned PCT structures.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== BindingSitePrediction ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/BindingSitePrediction.jar BindingSitePrediction] predicts instances of binding sites in a positive data set based on a previously learned model.&lt;br /&gt;
Run by calling&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;java -jar BindingSitePrediction.jar modelFile dataPos dataNeg alpha output&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the arguments have the following semantics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;table border=0 cellpadding=10 align=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;name&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;type&amp;lt;/td&amp;gt;&lt;br /&gt;
        &amp;lt;td&amp;gt;default&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;comment&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&amp;lt;td colspan=4&amp;gt;&amp;lt;hr&amp;gt;&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;modelFile&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the .xml representation (output of ModelTrainer) of the learned model.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataPos&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the positive data (fasta file or plain text) in which binding site locations are to be identified.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;dataNeg&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;--&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;The location of the negative data (fasta file or plain text) that is used for computing the prediction threshold.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;alpha&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Integer&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;1E-4&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Significance level on negative data.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;&amp;lt;font color=&amp;quot;green&amp;quot;&amp;gt;output&amp;lt;/font&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;String&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;bindingSites.txt&amp;lt;/td&amp;gt;&lt;br /&gt;
	&amp;lt;td&amp;gt;Location of output file for writing the predicted binding sites.&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
&amp;lt;/table&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Classification ===&lt;br /&gt;
The tool [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/Classification.jar Classification] performs first a motif discovery with subsequent fragment-based classification using positive data that is assumed to contain an instance of the motif, and negative data that is assumed not to contain the motif. The tool returns the classification results to the standard output.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
The exemplary [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/data.tar.gz data sets] contain extracted ChIP seq sequences of 50 different human transcription factors from the [http://genome.ucsc.edu/ENCODE ENCODE project], as well as corresponding negative data. All data sets are split into 10 different subsets for enabling a reproducible 10-fold cross validation.&lt;br /&gt;
&lt;br /&gt;
== Source code ==&lt;br /&gt;
Building the [http://www2.informatik.uni-halle.de/agbio/publications/PMMdenovo/PMMdenovo_sources.zip source code] requires Jstacs 2.1.&lt;/div&gt;</summary>
		<author><name>Eggeling</name></author>
	</entry>
</feed>