Main Page: Difference between revisions

From Jstacs
Jump to navigationJump to search
No edit summary
 
(97 intermediate revisions by 5 users not shown)
Line 1: Line 1:
= Jstacs =
__NOTOC__
== A Java framework for statistical analysis and classification of biological sequences ==
== A <font color=FireBrick>J</font>ava framework for <font color=FireBrick>st</font>atistical <font color=FireBrick>a</font>nalysis and <font color=FireBrick>c</font>lassification of biological <font color=FireBrick>s</font>equences ==
 
Sequence analysis is one of the major subjects of
[http://en.wikipedia.org/wiki/Bioinformatics bioinformatics].
Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as
alignment algorithms.
We present Jstacs, an [http://en.wikipedia.org/wiki/Open_source open source] Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an
efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches
for parameter learning. Using Jstacs, classifiers can be assessed and
compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented
design Jstacs is easy to use and readily extensible.
 
Jstacs is a joint project of the groups [http://www.informatik.uni-halle.de/arbeitsgruppen/bioinformatik/ Bioinformatics] and [http://www.informatik.uni-halle.de/arbeitsgruppen/mustererkennung/ Pattern Recognition and Bioinformatics] at the [http://www.informatik.uni-halle.de/ Institute of Computer Science] of [http://www.uni-halle.de/ Martin Luther University Halle-Wittenberg] and the Bioinformatics group of the [http://www.jki.bund.de/en/startseite/home.html Julius Kuehn Institute]. Initially the projects has also been developed at the [http://www.ipk-gatersleben.de Leibniz Institute of Plant Genetics and Crop Plant Research].
 
Jstacs is listed in the [http://mloss.org/software/ machine learning open-source software (mloss)] repository.
 
== Licensing Information ==
Jstacs is free software: you can redistribute it and/or modify under the terms of the [http://www.gnu.org/licenses/gpl-3.0.html GNU General Public License version 3] or (at your option) any later version as published by the [http://www.fsf.org/ Free Software Foundation].
 
== Current release ==
You can download Jstacs version 2.3 [[Downloads | here]].<br />
''You find an overview of the new features in the [[Version history]].''<br />
We also provide an [http://www.jstacs.de/api/index.html API documentation], a [[Cookbook]], and a [http://www.jstacs.de/downloads/refcard.pdf Reference card] for this release.
 
The current Jstacs code, including changes made since the last release, is available from [https://github.com/Jstacs github].
 
== Getting started & Cookbook==
For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see [[Getting started]].
 
Since version 2.0, we offer a [[Cookbook]] for Jstacs in addition to the [http://www.jstacs.de/api/index.html API documentation].
This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments.
The cookbook is accompanied by a number of [[Recipes]] or [[Code examples]] that can serve as a starting point of your own applications.
 
For a quick reference, we also provide a [http://www.jstacs.de/downloads/refcard.pdf Reference card].
 
== Publication ==
The [http://jmlr.csail.mit.edu/papers/v13/grau12a.html paper about Jstacs] has been published in the Journal of Machine Learning Research.
If you use Jstacs in your research, please cite
 
J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. ''Jstacs: A java framework for statistical analysis and classification of biological sequences''. Journal of Machine Learning Research, '''13'''(Jun):1967–1971, 2012.
 
[http://www.jstacs.de/downloads/jstacs_citation.bib BibTeX entry]
 
== JstacsFX ==
JstacsFX is a library for building applications with graphical user interface based on Jstacs classes and using JavaFX. JstacsFX builds upon the [http://www.jstacs.de/api/de/jstacs/tools/JstacsTool.html JstacsTool] interface that has also been used to create [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/cli/CLI.html command line] and [http://www.jstacs.de/api-2.3/de/jstacs/tools/ui/galaxy/Galaxy.html Galaxy] versions of tools with minimal effort. In addition it makes use of the [http://www.jstacs.de/api-2.3/de/jstacs/parameters/Parameter.html Parameter], [http://www.jstacs.de/api-2.3/de/jstacs/results/Result.html Result], and [http://www.jstacs.de/api-2.3/de/jstacs/results/savers/ResultSaver.html ResultSaver] classes of Jstacs.
 
The current release of JstacsFX is available from [[Downloads]] and an [http://www.jstacs.de/api-fx/index.html API documentation] is available.
 
Example applications using JstacsFX for their graphical user interface are [[InMoDe]] and [[AnnoTALE]].
 
== Applications ==
Applications currently using Jstacs:
* [[MotifAdjuster]]
* [[Dispom]]
* [[TALgetter]]
* [[TALENoffer]]
* [[Dimont]]
* [[GeMoMa]]
* [[AnnoTALE]]
* [[InMoDe]]
* [[Disentangler]]
 
== Bug reports & Feature requests ==
You can submit bug reports and feature requests by mail to [mailto:jstacs@informatik.uni-halle.de jstacs@informatik.uni-halle.de].<br />
<!-- In the Jstacs trac, we also provide a [https://trac.informatik.uni-halle.de/trac/jstacs/discussion forum] for discussions about Jstacs. -->
 
== Latest Papers ==
The paper '''''[[Catchitt | Accurate prediction of cell type-specific transcription factor binding]]''''' has been published in [https://doi.org/10.1186/s13059-018-1614-y Genome Biology].
 
The paper '''''[[PCTLearn | Algorithms for learning parsimonious context trees]]''''' has been published in [https://link.springer.com/article/10.1007/s10994-018-5770-9 Machine Learning].
 
The paper '''''[[Disentangler | Disentangling transcription factor binding site complexity]]''''' has been published in [https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky683/5063190 Nucleic Acids Research].
 
The paper '''''[[GeMoMa | Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi]]''''' has been published in [https://link.springer.com/article/10.1186%2Fs12859-018-2203-5 BMC Bioinformatics].
 
The paper '''''[[InMoDe | InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites]]''''' has been published in [https://academic.oup.com/bioinformatics/article/33/4/580/2666342/InMoDe-tools-for-learning-and-visualizing-intra Bioinformatics].
 
The paper '''''[[AnnoTALE | AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences]]''''' has been published in [http://www.nature.com/articles/srep21077 Scientific Reports].
 
The paper '''''[[PMMdeNovo | Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data ]]''''' has been published in [http://www.biomedcentral.com/1471-2105/16/375 BMC Bioinformatics].
 
The paper '''''[[Slim | Varying levels of complexity in transcription factor binding motifs]]''''' has been published in [http://nar.oxfordjournals.org/content/early/2015/06/23/nar.gkv577.abstract Nucleic Acids Research].
 
The paper '''''[[AUC-PR | Area under Precision-Recall Curves for Weighted and Unweighted Data]]''''' has been published in [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0092209 PLOS ONE].
 
The paper '''''[[Dimont | A general approach for discriminative de-novo motif discovery from high-throughput data]]''''' has been published in [http://nar.oxfordjournals.org/content/41/21/e197.abstract.html?etoc Nucleic Acids Research].
 
Further papers and projects can be found under [[Projects]].

Latest revision as of 22:58, 23 February 2019

A Java framework for statistical analysis and classification of biological sequences

Sequence analysis is one of the major subjects of bioinformatics. Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as alignment algorithms. We present Jstacs, an open source Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches for parameter learning. Using Jstacs, classifiers can be assessed and compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented design Jstacs is easy to use and readily extensible.

Jstacs is a joint project of the groups Bioinformatics and Pattern Recognition and Bioinformatics at the Institute of Computer Science of Martin Luther University Halle-Wittenberg and the Bioinformatics group of the Julius Kuehn Institute. Initially the projects has also been developed at the Leibniz Institute of Plant Genetics and Crop Plant Research.

Jstacs is listed in the machine learning open-source software (mloss) repository.

Licensing Information

Jstacs is free software: you can redistribute it and/or modify under the terms of the GNU General Public License version 3 or (at your option) any later version as published by the Free Software Foundation.

Current release

You can download Jstacs version 2.3 here.
You find an overview of the new features in the Version history.
We also provide an API documentation, a Cookbook, and a Reference card for this release.

The current Jstacs code, including changes made since the last release, is available from github.

Getting started & Cookbook

For set-up instructions, a list of basic requirements, and suggestions for your first steps with Jstacs, please see Getting started.

Since version 2.0, we offer a Cookbook for Jstacs in addition to the API documentation. This cookbook comprises a general description of the structure of Jstacs including data handling, statistical models, classifiers, and assessments. The cookbook is accompanied by a number of Recipes or Code examples that can serve as a starting point of your own applications.

For a quick reference, we also provide a Reference card.

Publication

The paper about Jstacs has been published in the Journal of Machine Learning Research. If you use Jstacs in your research, please cite

J. Grau, J. Keilwagen, A. Gohr, B. Haldemann, S. Posch, and I. Grosse. Jstacs: A java framework for statistical analysis and classification of biological sequences. Journal of Machine Learning Research, 13(Jun):1967–1971, 2012.

BibTeX entry

JstacsFX

JstacsFX is a library for building applications with graphical user interface based on Jstacs classes and using JavaFX. JstacsFX builds upon the JstacsTool interface that has also been used to create command line and Galaxy versions of tools with minimal effort. In addition it makes use of the Parameter, Result, and ResultSaver classes of Jstacs.

The current release of JstacsFX is available from Downloads and an API documentation is available.

Example applications using JstacsFX for their graphical user interface are InMoDe and AnnoTALE.

Applications

Applications currently using Jstacs:

Bug reports & Feature requests

You can submit bug reports and feature requests by mail to jstacs@informatik.uni-halle.de.

Latest Papers

The paper Accurate prediction of cell type-specific transcription factor binding has been published in Genome Biology.

The paper Algorithms for learning parsimonious context trees has been published in Machine Learning.

The paper Disentangling transcription factor binding site complexity has been published in Nucleic Acids Research.

The paper Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi has been published in BMC Bioinformatics.

The paper  InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites has been published in Bioinformatics.

The paper AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences has been published in Scientific Reports.

The paper Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data has been published in BMC Bioinformatics.

The paper Varying levels of complexity in transcription factor binding motifs has been published in Nucleic Acids Research.

The paper Area under Precision-Recall Curves for Weighted and Unweighted Data has been published in PLOS ONE.

The paper A general approach for discriminative de-novo motif discovery from high-throughput data has been published in Nucleic Acids Research.

Further papers and projects can be found under Projects.