GenDisMix: Difference between revisions
No edit summary |
(→Paper) |
||
(5 intermediate revisions by one other user not shown) | |||
Line 13: | Line 13: | ||
== Paper == | == Paper == | ||
The paper '''''Unifying generative and discriminative learning principles''''' has been | The paper '''''Unifying generative and discriminative learning principles''''' has been published in [http://www.biomedcentral.com/1471-2105/11/98 BMC Bioinformatics]. | ||
== Binary == | == Binary == | ||
* [http://www.jstacs.de/downloads/GenDisMixApp. | * [http://www.jstacs.de/downloads/GenDisMixApp-bin.zip GenDisMix binary] | ||
* [http://www.jstacs.de/downloads/GenDisMixApp.java Sources of the binary] | * [http://www.jstacs.de/downloads/GenDisMixApp.java Sources of the binary] | ||
* Requirements: [http://java.sun.com Java] v. 5 or later, [[Downloads | Jstacs 1.3]] or later | * Requirements: [http://java.sun.com Java] v. 5 or later, [[Downloads | Jstacs 1.3]] or later | ||
* Put <code>jstacs-1.3.jar</code> (contained in [[Downloads | Jstacs 1.3]] binary) into the same directory as <code>GenDisMixApp.java</code> or <code>GenDisMixApp.class</code>, respectively | |||
* Compile by calling <code>javac -cp jstacs-1.3.jar GenDisMixApp.java</code> | * Compile by calling <code>javac -cp jstacs-1.3.jar GenDisMixApp.java</code> | ||
* Run by calling <code>java -cp .:jstacs-1.3.jar GenDisMixApp</code> (Unix,Linux) or <code>java -cp .;jstacs-1.3.jar GenDisMixApp</code> (Windows) | * Run by calling <code>java -cp .:jstacs-1.3.jar GenDisMixApp</code> (Unix,Linux) or <code>java -cp .;jstacs-1.3.jar GenDisMixApp</code> (Windows) | ||
* Arguments: | * Arguments: | ||
home ... home directory (the path to the data directory, default = ./) = ./ | |||
home ... home directory (the path to the data directory, default = ./) = ./ | fg ... foreground file (the file name of the foreground data file in FastA-format) = null | ||
fg ... foreground file (the file name of the foreground data file in FastA-format) = null | bg ... background file (the file name of the background data file in FastA-format) = null | ||
bg ... background file (the file name of the background data file in FastA-format) = null | gen ... generative weight (the weight of the generative component, valid range = [0.0, 1.0]) = null | ||
gen ... generative weight (the weight of the generative component, valid range = [0.0, 1.0]) = null | dis ... discriminative weight (the weight of the discriminative component, valid range = [0.0, 1.0]) = null | ||
dis ... discriminative weight (the weight of the discriminative component, valid range = [0.0, 1.0]) = null | eps ... epsilon (numerical optimization is stopped if gain less than epsilon, default = 1.0E-6) = 1.0E-6 | ||
eps ... epsilon (numerical optimization is stopped if gain less than epsilon, default = 1.0E-6) = 1.0E-6 | threads ... threads (the number of threads used for the computation, default = 1) = 1 | ||
threads ... threads (the number of threads used for the computation, default = 1) = 1 | essFG ... essFG (the equivalent sample size used for the foreground class, default = 4.0) = 4.0 | ||
essFG ... essFG (the equivalent sample size used for the foreground class, default = 4.0) = 4.0 | essBG ... essFG (the equivalent sample size used for the background class, default = 4.0) = 4.0 | ||
essBG ... essFG (the equivalent sample size used for the background class, default = 4.0) = 4.0 | outfile ... outfile (the name of the file where to store the classifier in XML-format, default = gendismix.xml) = gendismix.xml | ||
outfile ... outfile (the name of the file where to store the classifier in XML-format, default = gendismix.xml) = gendismix.xml | uk ... unkown file (the file name of the data in FastA-format that shall be classified, OPTIONAL) = null | ||
uk ... unkown file (the file name of the data in FastA-format that shall be classified, OPTIONAL) = null | |||
* Example: <code>java -cp .:jstacs-1.3.jar GenDisMixApp fg=fgfile.fasta bg=bgfile.fasta gen=0.4 dis=0.4 essBG=8 eps=1E-6 threads=2 outfile=classifier.xml</code> | * Example: <code>java -cp .:jstacs-1.3.jar GenDisMixApp fg=fgfile.fasta bg=bgfile.fasta gen=0.4 dis=0.4 essBG=8 eps=1E-6 threads=2 outfile=classifier.xml</code> | ||
== References in Jstacs == | == References in Jstacs == | ||
* Example: [[Train classifiers using GenDisMix (a hybrid learning principle)]] | * Example: [[Train classifiers using GenDisMix (a hybrid learning principle)]] | ||
* API for [http://www.jstacs.de/api/de/jstacs/classifier/scoringFunctionBased/gendismix/GenDisMixClassifier.html GenDisMixClassifier] | * API for [http://www.jstacs.de/api/de/jstacs/classifier/scoringFunctionBased/gendismix/GenDisMixClassifier.html GenDisMixClassifier] |
Latest revision as of 06:32, 23 February 2010
by Jens Keilwagen, Jan Grau, Stefan Posch, Marc Strickert, and Ivo Grosse.
Description
Background
The recognition of functional binding sites in genomic DNA remains one of the fundamental challenges of genome research. During the last decades, a plethora of different and well-adapted models has been developed, but only little attention has be payed to the development of different and similarly well-adapted learning principles. Only recently it was noticed that discriminative learning principles can be superior over generative ones in diverse bioinformatics applications, too.
Results
Here, we propose a generalization of generative and discriminative learning principles containing the maximum likelihood, maximum a-posteriori, maximum conditional likelihood, maximum supervised posterior, generative-discriminative trade-off, and penalized generative-discriminative trade-off learning principles as special cases, and we illustrate its efficacy for the recognition of vertebrate transcription factor binding sites.
Conclusions
We find that the proposed learning principle helps to improve the recognition of transcription factor binding sites and enables better computational approaches for extracting as much information as possible from valuable wet-lab data. We make all implementations available in the open-source library Jstacs so that this learning principle can be easily applied to other classification problems in the field of genome and epigenome analysis.
Paper
The paper Unifying generative and discriminative learning principles has been published in BMC Bioinformatics.
Binary
- GenDisMix binary
- Sources of the binary
- Requirements: Java v. 5 or later, Jstacs 1.3 or later
- Put
jstacs-1.3.jar
(contained in Jstacs 1.3 binary) into the same directory asGenDisMixApp.java
orGenDisMixApp.class
, respectively - Compile by calling
javac -cp jstacs-1.3.jar GenDisMixApp.java
- Run by calling
java -cp .:jstacs-1.3.jar GenDisMixApp
(Unix,Linux) orjava -cp .;jstacs-1.3.jar GenDisMixApp
(Windows) - Arguments:
home ... home directory (the path to the data directory, default = ./) = ./ fg ... foreground file (the file name of the foreground data file in FastA-format) = null bg ... background file (the file name of the background data file in FastA-format) = null gen ... generative weight (the weight of the generative component, valid range = [0.0, 1.0]) = null dis ... discriminative weight (the weight of the discriminative component, valid range = [0.0, 1.0]) = null eps ... epsilon (numerical optimization is stopped if gain less than epsilon, default = 1.0E-6) = 1.0E-6 threads ... threads (the number of threads used for the computation, default = 1) = 1 essFG ... essFG (the equivalent sample size used for the foreground class, default = 4.0) = 4.0 essBG ... essFG (the equivalent sample size used for the background class, default = 4.0) = 4.0 outfile ... outfile (the name of the file where to store the classifier in XML-format, default = gendismix.xml) = gendismix.xml uk ... unkown file (the file name of the data in FastA-format that shall be classified, OPTIONAL) = null
- Example:
java -cp .:jstacs-1.3.jar GenDisMixApp fg=fgfile.fasta bg=bgfile.fasta gen=0.4 dis=0.4 essBG=8 eps=1E-6 threads=2 outfile=classifier.xml