Intermediate course: XMLParser, Parameters, and Results

From Jstacs
Revision as of 14:00, 2 February 2012 by Grau (talk | contribs)
Jump to navigationJump to search

Intermediate course: XMLParser, Parameters, and Results

In the early days of Jstacs, we stored models, classifiers, and other Jstacs objects using the standard serialization of Java. However, this mechanism made it impossible to load objects of earlier versions of a class and the files where not human-readable. Hence, we started to create a facility for storing objects to XML representations. In the current version of Jstacs, this is accomplished by an interface Storable for objects that can be converted to and from their XML representation, and a class XMLParser that can handle such Storable s, Singleton s, Strings, Classes, primitives, and arrays thereof. In the first sub-section, we give examples how to use the XMLParser.

Another problem we wanted to handle has been the documentation of (external) parameters of models, classifiers, or other classes. Although documentation exists in the Javadocs, these are inaccessible from the code. Hence, we created classes for the documentation of parameters and sets of parameters, namely the subclasses of Parameter and ParameterSet. A Parameter at least provides the name of and a comment on the parameter that is described. In sub-classes, other values are also available like, for instance, the set or a range of allowed values. Such a description of parameters allows for manifold generic convenience applications. Current examples are the ParameterSetTagger, which facilitates the documentation of command line arguments on basis of a ParameterSet, or the GalaxyAdaptor, which allows for an easy integration of Jstacs applications into the Galaxy webserver. We give examples for the use and creation of Parameter s and ParameterSet s in the second sub-section.

Finally, the same problem also occurrs for the results of computations. With a generic documentation, these results can be displayed together with some annotation in a way that is appropriate for the current application. In Jstacs, we use Result s and ResultSet s for this purpose, and we show how to use these in the third sub-section.

XMLParser

In the following examples, let buffer be some StringBuffer. All kinds of primitives or Storable s are appended to an existing StringBuffer surrounded by the specified XML tags by the static method appendObjectWithTags of XMLParser. For example, the following two lines append an integer with the value [math]5[/math] using the tag integer, and a String with the tag foo:

int integer = 5;
XMLParser.appendObjectWithTags( buffer, integer, "integer" );
String bar = "hello world";
XMLParser.appendObjectWithTags( buffer, bar, "foo" );

If we assume that buffer was an empty StringBuffer before appending these two elements, the resulting XML text will be

<integer><className>java.lang.Integer</className>5</integer>
<foo><className>java.lang.String</className>hello world</foo>

In exactly the same manner, we can append XML representations of arrays of primitives, for example a two-dimensional array of double s

double[][] da = new double[4][6];
XMLParser.appendObjectWithTags( buffer, da, "da" );

or complete Jstacs models that implement the Storable interface

HomogeneousMM hMM = new HomogeneousMM( new HomMMParameterSet( DNAAlphabetContainer.SINGLETON, 4, "hmm(0)", (byte) 0 ) );
XMLParser.appendObjectWithTags( buffer, hMM, "hMM" );

or even arrays of Storable s:

Storable[] storAr = ArrayHandler.createArrayOf( hMM, 5 );
XMLParser.appendObjectWithTags( buffer, storAr, "storAr" );

The interface Storable only defines two things: first, an implementing class must provide a public method toXML() that returns the XML representation of this class as a StringBuffer, and second, it must provide a constructor that takes a single StringBuffer as its argument and re-creates an object out of this representation. The only exception from this rule are singletons, i.e., classes that implement the Singleton interface.

Of course, you can use the appendObjectWithTags method of the XMLParser inside the toXML method. By this means, it is possible to break down the conversion of complex models into smaller pieces if the building-blocks of a model are also Storable s.

In analogy to storing objects, the XMLParser also provides facilities for loading primitives and Storable s from their XML representation. These can also be used in the constructor according to the Storable interface. For example, we can load the value of the integer, we stored a few lines ago by calling

integer = (Integer) XMLParser.extractObjectForTags( buffer, "integer" );

where the second argument of extractObjectForTags is the tag surrounding the value and, of course, must be identical to the tag we specified when storing the value. Since extractObjectForTags is a generic method, we must explicitly cast the returned value to an Integer. As an alternative, we can also specify the class of the return type as a third argument like in the following example

da = XMLParser.extractObjectForTags( buffer, "da", double[][].class );

Here, we load the two-dimensional array of doubles that we stored a few lines ago. In perfect analogy, we can also load a single instance of a class implementing Storable

hMM = XMLParser.extractObjectForTags( buffer, "hMM", HomogeneousMM.class );

where in this case we again specify the class of the return type in the third argument, or arrays of Storable

storAr = (Storable[]) XMLParser.extractObjectForTags( buffer, "storAr" );


Of course, we can also specify the concrete sub-class of Storable for an array, if all instances are of the same class like in the following example:

HomogeneousMM[] hmAr = ArrayHandler.createArrayOf( hMM, 5 );
XMLParser.appendObjectWithTags( buffer, hmAr, "hmAr" );
hmAr = (HomogeneousMM[]) XMLParser.extractObjectForTags( buffer, "hmAr" );


Parameters & ParameterSets

Parameters in Jstacs are represented by different sub-classes of Parameter, which define different types of parameters. Parameters that take primitives or strings as values are defined by the class SimpleParameter, parameters that accept values from some enum type are defined by EnumParameter, parameters where the user can select from a number of predefined values are defined by SelectionParameter, parameters that represent a file argument are defined by FileParameter, and parameters that represent a range of values are represented by RangeParameter. In the following, we give some examples for the creation of parameter objects. Let us assume, we want to define a parameter for the length of the sequences accepted by some model. The maximum sequence length this model can handle is [math]100[/math] and, of course, lengths cannot be negative. We create such a parameter object by the following lines of code:

SimpleParameter simplePar = new SimpleParameter( DataType.INT, "Sequence length", "The required length of a sequence", true, new NumberValidator<Integer>( 1, 100 ), 10 );

The first argument of the constructor defines the data type of the accepted values, which is an int in the example. The next two arguments are the name of and the comment for the parameter. The following boolean specifies if this parameter is required (true) or optional (false). The NumberValidator in the fifth argument allows for specifying the range of allowed values, which is [math]0[/math] to [math]100[/math] (inclusive) in the example. Finally, we define a default value for this parameter, which is [math]10[/math] in the example. Similarly, we can define a SimpleParameter †for some optional parameter that takes strings as values by the following line:

SimpleParameter simplePar2 = new SimpleParameter( DataType.STRING, "Name", "The name of the game", false );

Again, the second and third arguments are the name and the comment, respectively.

We can define an EnumParameter, which accept values from some enum type as follows

EnumParameter enumpar = new EnumParameter( DataType.class, "Data types", true );

where the first argument defines the class of the enum type, the second is the name of that collection of values, and the third argument again specifies if this parameter is required.

A SelectionParameter accepts values from a pre-defined collection of values. For instance, if we want the user to select from two double values [math]5.0[/math] and [math]5E6[/math], which are named "small" and "large", we can do so as follows:

SelectionParameter collPar = new SelectionParameter( DataType.DOUBLE, new String[]{"small", "large"}, new Double[]{5.0,5E6}, "Numbers", "A selection of numbers", true );


For the special case, where the user shall select the concrete implementation of an abstract class of interface, Jstacs provides a static convenience method getSelectionParameter in the class SubclassFinder. This method requires the specification of the super-class of the ParameterSet that can be used to instantiate the implementations, the root package in which sub-classes or implementations shall be found, and, again, a name, a comment, and if this parameter is required. For example, we can find all classes that can be instantiated by a sub-class of SequenceScoringParameterSet the package de and its sub-packages by calling

collPar = SubclassFinder.getSelectionParameter( SequenceScoringParameterSet.class, "de", "Sequence scores", "All Sequence scores in Jstacs that can be created from parameter sets", true );

The method returns a SelectionParameter from which a user can select the appropriate implementation. Classes that can be found in this manner must implement an additional interface called InstantiableFromParameterSet. The main purpose of this interface is that implementing classes must provide a constructor that takes a InstanceParameterSet as its only argument in analogy to the constructor of Storable working on a StringBuffer. InstanceParameterSet s will be explained a few lines below.

As the name suggests, ParameterSet s represent sets of such parameters. The most simple implementation of a ParameterSet is the SimpleParameterSet, which can be created just from a number of Parameter s like in the following example:

SimpleParameterSet parSet = new SimpleParameterSet( simplePar,collPar );

Other ParameterSet s are the ExpandableParameterSet and ArrayParameterSet, which can handle series of identical parameter types.

One special case of ParameterSet s is the InstanceParameterSet, which has several sub-classes that can be used to instantiate new Jstacs objects like statistical models or classifiers. If a new model, say an implementation of the TrainableStatisticalModel interface, shall be found via the SubclassFinder, or its parameters shall be set in a command line program using the ParameterSetTagger or in Galaxy, we need to create a new sub-class of InstanceParameterSet that represents all (external) parameters of this model. In this sub-class we must basically implement two methods: getInstanceName and getInstanceComment return the name of and a comment on the model class (i.e., in the example, the model we just implemented) that may be of help for a potential user. The constructor does the main work. By a call to the super-constructor, it initializes the list of parameters in this set and then adds the parameters of the model.


For implementations of the TrainableStatisticalModel interface we may also extend SequenceScoringParameterSet, which already handles the AlphabetContainer and length of this model.

Not always do we have flat hierarchies of parameters. For instance, the choice of subsequent parameters may depend on the selection from some SelectionParameter. For this purpose, Jstacs provides a sub-class of Parameter that only serves as a container for a ParameterSet and is called ParameterSetContainer. Like other parameters, this container takes a name and a comment in its constructor, whereas the third argument is a ParameterSet:

ParameterSetContainer container = new ParameterSetContainer( "Set", "A set of parameters", parSet );

Since such a ParameterSetContainer can itself be part of another ParameterSet, we can build hierarchies or trees of Parameter s and ParameterSet s. ParameterSetContainers are also used internally to create SelectionParameter s from an array of ParameterSet s, e.g., for getSelectionParameter in the SubclassFinder.

Results & ResultSets

In Jstacs, several types of Result s are implemented. The two basic Result types are NumericalResult s and CategoricalResult s. The first are results containing numerical values, which can be aggregated, for instance averaged, while the latter are results of categorical values like strings or booleans. For example, we can create a NumericalResult containing a single double value by the following line

NumericalResult res = new NumericalResult( "A double result", "This result contains some double value", 5.0 );

where, in analogy to Parameter s, the first and the second argument are the name of and a comment on the result, respectively.

Similarly, we create a CategoricalResult by the following line

CategoricalResult catRes = new CategoricalResult( "A boolean result", "This result contains some boolean", true );

for a result that is a single boolean value.

As for ParameterSet s, we can create sets of results using the class ResultSet

ResultSet resSet = new ResultSet( new Result[]{res,catRes} );

where we may also combine NumericalResult s and CategoricalResult s into a single set. Besides simple ResultSet s, Jstacs comprises NumericalResultSet s for combining only NumericalResult s, which can be created in complete analogy to ResultSet s.

Jstacs also provides a special class for averaging NumericalResult. This class is called MeanResultSet, and computes the average and standard error of the corresponding values of a number of NumericalResultSet s. The corresponding NumericalResult s in the NumericalResultSet are identified by their name as specified upon creation.

We first create an empty MeanResultSet by calling its default constructor

MeanResultSet mrs = new MeanResultSet();

and subsequently add a number of NumericalResultSet s to this MeanResultSet.

Random r = new Random();
for(int i=0;i<10;i++){
	mrs.addResults( new NumericalResultSet( new NumericalResult( "Single", "A single result to be aggregated", r.nextDouble() ) ) );
}

In the example, these are just 10 uniformly distributed random numbers.

Finally, we call the method getStatistics of MeanResultSet to obtain the mean and standard error of the previously added values.

System.out.println( mrs.getStatistics() );

the result of this method is again returned as a NumericalResultSet. In the example, it is just printed to standard out.