Intermediate course: XMLParser, Parameters, and Results
In the early days of Jstacs, we stored models, classifiers, and other Jstacs objects using the standard serialization of Java. However, this mechanism made it impossible to load objects of earlier versions of a class and the files where not human-readable. Hence, we started to create a facility for storing objects to XML representations. In the current version of Jstacs, this is accomplished by an interface Storable for objects that can be converted to and from their XML representation, and a class XMLParser that can handle such Storable s, Singleton s, Strings, Classes, primitives, and arrays thereof. In the first sub-section, we give examples how to use the XMLParser.
Another problem we wanted to handle has been the documentation of (external) parameters of models, classifiers, or other classes. Although documentation exists in the Javadocs, these are inaccessible from the code. Hence, we created classes for the documentation of parameters and sets of parameters, namely the subclasses of Parameter and ParameterSet. A Parameter at least provides the name of and a comment on the parameter that is described. In sub-classes, other values are also available like, for instance, the set or a range of allowed values. Such a description of parameters allows for manifold generic convenience applications. Current examples are the ParameterSetTagger, which facilitates the documentation of command line arguments on basis of a ParameterSet, or the GalaxyAdaptor, which allows for an easy integration of Jstacs applications into the Galaxy webserver. We give examples for the use and creation of Parameter s and ParameterSet s in the second sub-section.
Finally, the same problem also occurrs for the results of computations. With a generic documentation, these results can be displayed together with some annotation in a way that is appropriate for the current application. In Jstacs, we use Result s and ResultSet s for this purpose, and we show how to use these in the third sub-section.
In the following examples, let
buffer be some
All kinds of primitives or Storable s are appended to an existing
StringBuffer surrounded by the specified XML tags by the static method
appendObjectWithTags of XMLParser. For example, the following two lines append an integer with the value 5 using the tag
integer, and a
String with the tag
XMLParser.appendObjectWithTags( buffer, integer, "integer" );
String bar = "hello world";
XMLParser.appendObjectWithTags( buffer, bar, "foo" );
If we assume that
buffer was an empty
StringBuffer before appending these two elements, the resulting XML text will be
In exactly the same manner, we can append XML representations of arrays of primitives, for example a two-dimensional array of
XMLParser.appendObjectWithTags( buffer, da, "da" );
or complete Jstacs models that implement the Storable interface
XMLParser.appendObjectWithTags( buffer, hMM, "hMM" );
or even arrays of Storable s:
XMLParser.appendObjectWithTags( buffer, storAr, "storAr" );
The interface Storable only defines two things: first, an implementing class must provide a public method
toXML() that returns the XML representation of this class as a
StringBuffer, and second, it must provide a constructor that takes a single
StringBuffer as its argument and re-creates an object out of this representation. The only exception from this rule are singletons, i.e., classes that implement the Singleton interface.
Of course, you can use the
appendObjectWithTags method of the XMLParser inside the
toXML method. By this means, it is possible to break down the conversion of complex models into smaller pieces if the building-blocks of a model are also Storable s.
In analogy to storing objects, the XMLParser also provides facilities for loading primitives and Storable s from their XML representation. These can also be used in the constructor according to the Storable interface. For example, we can load the value of the integer, we stored a few lines ago by calling
where the second argument of
extractObjectForTags is the tag surrounding the value and, of course, must be identical to the tag we specified when storing the value. Since
extractObjectForTags is a generic method, we must explicitly cast the returned value to an
Integer. As an alternative, we can also specify the class of the return type as a third argument like in the following example
Here, we load the two-dimensional array of
doubles that we stored a few lines ago.
In perfect analogy, we can also load a single instance of a class implementing Storable
where in this case we again specify the class of the return type in the third argument, or arrays of Storable
Of course, we can also specify the concrete sub-class of Storable for an array, if all instances are of the same class like in the following example:
XMLParser.appendObjectWithTags( buffer, hmAr, "hmAr" );
hmAr = (HomogeneousMM) XMLParser.extractObjectForTags( buffer, "hmAr" );
Parameters & ParameterSets
Parameters in Jstacs are represented by different sub-classes of Parameter, which define different types of parameters. Parameters that take primitives or strings as values are defined by the class SimpleParameter, parameters that accept values from some
enum type are defined by EnumParameter, parameters where the user can select from a number of predefined values are defined by SelectionParameter, parameters that represent a file argument are defined by FileParameter, and parameters that represent a range of values are represented by RangeParameter.
In the following, we give some examples for the creation of parameter objects. Let us assume, we want to define a parameter for the length of the sequences accepted by some model. The maximum sequence length this model can handle is 100 and, of course, lengths cannot be negative. We create such a parameter object by the following lines of code:
The first argument of the constructor defines the data type of the accepted values, which is an
int in the example. The next two arguments are the name of and the comment for the parameter. The following boolean specifies if this parameter is required (
true) or optional (
false). The NumberValidator in the fifth argument allows for specifying the range of allowed values, which is 0 to 100 (inclusive) in the example. Finally, we define a default value for this parameter, which is 10 in the example.
Similarly, we can define a SimpleParameter †for some optional parameter that takes strings as values by the following line:
Again, the second and third arguments are the name and the comment, respectively.
We can define an EnumParameter, which accept values from some
enum type as follows
where the first argument defines the class of the
enum type, the second is the name of that collection of values, and the third argument again specifies if this parameter is required.
A SelectionParameter accepts values from a pre-defined collection of values. For instance, if we want the user to select from two double values 5.0 and 5E6, which are named "small" and "large", we can do so as follows:
For the special case, where the user shall select the concrete implementation of an abstract class of interface, Jstacs provides a static convenience method
getSelectionParameter in the class SubclassFinder. This method requires the specification of the super-class of the ParameterSet that can be used to instantiate the implementations, the root package in which sub-classes or implementations shall be found, and, again, a name, a comment, and if this parameter is required.
For example, we can find all classes that can be instantiated by a sub-class of SequenceScoringParameterSet the package
de and its sub-packages by calling
The method returns a SelectionParameter from which a user can select the appropriate implementation.
Classes that can be found in this manner must implement an additional interface called InstantiableFromParameterSet. The main purpose of this interface is that implementing classes must provide a constructor that takes a InstanceParameterSet as its only argument in analogy to the constructor of Storable working on a
StringBuffer. InstanceParameterSet s will be explained a few lines below.
As the name suggests, ParameterSet s represent sets of such parameters. The most simple implementation of a ParameterSet is the SimpleParameterSet, which can be created just from a number of Parameter s like in the following example:
One special case of ParameterSet s is the InstanceParameterSet, which has several sub-classes that can be used to instantiate new Jstacs objects like statistical models or classifiers. If a new model, say an implementation of the TrainableStatisticalModel interface, shall be found via the SubclassFinder, or its parameters shall be set in a command line program using the ParameterSetTagger or in Galaxy, we need to create a new sub-class of InstanceParameterSet that represents all (external) parameters of this model. In this sub-class we must basically implement two methods:
getInstanceComment return the name of and a comment on the model class (i.e., in the example, the model we just implemented) that may be of help for a potential user. The constructor does the main work. By a call to the super-constructor, it initializes the list of parameters in this set and then adds the parameters of the model.
Not always do we have flat hierarchies of parameters. For instance, the choice of subsequent parameters may depend on the selection from some SelectionParameter. For this purpose, Jstacs provides a sub-class of Parameter that only serves as a container for a ParameterSet and is called ParameterSetContainer. Like other parameters, this container takes a name and a comment in its constructor, whereas the third argument is a ParameterSet:
Since such a ParameterSetContainer can itself be part of another ParameterSet, we can build hierarchies or trees of Parameter s and ParameterSet s. ParameterSetContainers are also used internally to create SelectionParameter s from an array of ParameterSet s, e.g., for
getSelectionParameter in the SubclassFinder.
Results & ResultSets
In Jstacs, several types of Result s are implemented. The two basic Result types are NumericalResult s and CategoricalResult s. The first are results containing numerical values, which can be aggregated, for instance averaged, while the latter are results of categorical values like strings or booleans.
For example, we can create a NumericalResult containing a single
double value by the following line
where, in analogy to Parameter s, the first and the second argument are the name of and a comment on the result, respectively.
Similarly, we create a CategoricalResult by the following line
for a result that is a single
where we may also combine NumericalResult s and CategoricalResult s into a single set. Besides simple ResultSet s, Jstacs comprises NumericalResultSet s for combining only NumericalResult s, which can be created in complete analogy to ResultSet s.
Jstacs also provides a special class for averaging NumericalResult. This class is called MeanResultSet, and computes the average and standard error of the corresponding values of a number of NumericalResultSet s. The corresponding NumericalResult s in the NumericalResultSet are identified by their name as specified upon creation.
We first create an empty MeanResultSet by calling its default constructor
mrs.addResults( new NumericalResultSet( new NumericalResult( "Single", "A single result to be aggregated", r.nextDouble() ) ) );
In the example, these are just 10 uniformly distributed random numbers.
Finally, we call the method
getStatistics of MeanResultSet to obtain the mean and standard error of the previously added values.
the result of this method is again returned as a NumericalResultSet. In the example, it is just printed to standard out.