From Jstacs
Revision as of 17:04, 2 February 2012 by Grau (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Handling data

Also have a look at the code example Loading data.

Q: How do I create an AlphabetContainer instance for DNA sequences?
A: AlphabetContainer container = new AlphabetContainer( new DNAAlphabet() );

Q: Why shall I use the AlphabetContainer and not just a simple Alphabet instance?
A: Because for some data you will not have the same alphabet at each position of the sequence, e.g. when using phenotypic data. Hence, we also strongly recommend to always use getAlphabetLengthAt(int) when setting e.g. the size of an array.

Q: How can I create a simple sequence?
A: Try to use the create method of Sequence, e.q. Sequence.create( new AlphabetContainer( new DNAAlphabet() ), "ACGTACGT" );

Q: How can I load my own data?
A: If your sequences are stored either in plain text or in FastA format, you can directly create a new DataSet from the file.

Q: I wrote some sophisticated method using BioJava to load my data from a Genbank file/a database/somewhere else. How can I do something similar in Jstacs?
A: You can still use your existing method. Jstacs has an adapter for BioJava SequenceIterators.

Using existing models

Also have a look at the Code examples.

Q: Where do I find a list of the models currently implemented in Jstacs?
A: All generative models in Jstacs implement the TrainableStatisticalModel interface.
All discriminative models in Jstacs implement the DifferentiableStatisticalModel interface. You find all the existing implementations in the list of implementing classes of these two interfaces.

Q: I decided for two TrainableStatisticalModels. How do I learn them and classify new data?
A: You can create a new TrainSMBasedClassifier from your models and use its train and classify methods. If you only want to learn a model from data, e.g. to sample new sequences, you can also directly use the train method of the TrainableStatisticalModel.

Q: I decided for two DifferentiableStatisticalModels. How do I learn them and classify new data?
A: You can create a new GenDisMixClassifier from your models and use its train and classify methods.

Q: I have to decide, which model is best for my classification task. How do I assess different model combinations or classifiers?
A: You can use the subclasses of ClassifierAssessment, e.g. KFoldCrossValidation. All ClassifierAssessments have a constructor that accepts an array of classifiers (or TrainableStatisticalModels). You can then use the assess method to assess these classifiers on the same data using a number of pre-defined performance measures.

Q: How can store and load my model, classifier, ...?
A: All classes that implement Storable have a method toXML() that returns a StringBuffer containing the instance as XML. Such classes should also have a proper constructor with a single argument StringBuffer. This can be used to create a new instance form a StringBuffer that contains an instance as XML. In addition, the class FileManager allows to read and write StringBuffers to the hard drive.

Q: Why does Jstacs use XML to save instances?
A: Because it is human-readable.

Q: I use Gibbs sampling in a class extending AbstractMixtureTrainSM.

  • Q1: Why does the sampling create files in temporary directory of Java?
  • Q2: Will these files be deleted automatically, if they will not be used any more?


  • A1: These files are created for saving the sampled parameter temporarily. Java temp is used to minimize network load if you work on a cluster.
  • A2: These files will be deleted if no reference to the mixture instance exists and the Garbage collector is called. Therefore it is recommended to call the Garbage collector explicitly at the end of any application.

Implementing new models

Q: How do I implement a new generative TrainableStatisticalModel?
A: Write an implementation of the TrainableStatisticalModel interface. For convenience, you can use the abstract AbstractTrainableStatisticalModel class with default implementations for many methods.

Q: How do I implement a new discriminative model?
A: Write an implementation of the DifferentiableStatisticalModel interface. For convenience, you can use the abstract AbstractDifferentiableStatisticalModel class with default implementations for many methods.

Q: How do I implement a model that can be trained generatively and discriminatively?
A: You can either extend AbstractTrainableStatisticalModel and additionally implement the DifferentiableStatisticalModel interface, or you extend the AbstractDifferentiableStatisticalModel and additionally implement the TrainableStatisticalModel interface.

Reporting bugs and requesting new features

Q: How do I report bugs I found in Jstacs?
A: Before reporting bugs in Jstacs, you should be sure it's not a feature ;-) You can discuss potential issues in the Jstacs forum. You can also have a look at the bugs that have already been reported. If you are sure that you found a new bug, please submit a new ticket to the Jstacs bug tracking system.

Q: How can request new features?
A: You may use the Jstacs forum to discuss your request with other users. Most likely, we will join the discussion, too (We are somewhere out there!).
If you are convinced that the feature you request will be useful for all users of Jstacs, you are invited to submit a new ticket with your request.


Q: The class UserTime does not work! Why?
A: The class UserTime uses native code. Therefore there are at least two possibilities:

  • A1: You have forgotten to set the Java library path: -Djava.library.path=...
  • A2: You have to compile the native code on your system.