Dessert: Alignments, Utils, and goodies

From Jstacs
Jump to navigationJump to search

In this section, we present a motley composition of interesting classes of Jstacs.

Alignments

In this subsection, we present how to compute Alignments using Jstacs.

If we like to compute an alignment, we first have to define the costs for match, mismatch, and gaps. In Jstacs, we provide the interface Costs that declares all necessary method used during the alignment. In this example, we restrict to simple costs that are 0 for match, 1 for match, 0.5 for a gap.


Costs costs = new SimpleCosts( 0, 1, 0.5 );


Second, we have to provide an instance of Alignment. This instance contains all information needed for an alignment and stores for instance matrices used for dynamic programming. When creating an instance, we have to specify which kind of alignment we like to have. Jstacs supports local, global and semi-global alignments (cf. Alignment.AlignmentType).


Alignment align = new Alignment( AlignmentType.GLOBAL, costs );


In second constructor it is also possible to specify the number of off-diagonals to be used in the alignment leading to a potential speedup.

Finally, we can compute the optimal alignment between two Sequence s and write the result to the standard output.


System.out.println( align.getAlignment( seq1, seq2 ) );


The alignment instance can be reused for aligning further sequences.

In Jstacs, we also provide the possibility of computing optimal alignments with affine gap costs. For this reason, we implement the class AffineCosts that is used to specify the cost for a gap opening. The costs for gap elongation are given by the gap costs of the internally used instance of Costs.


costs = new AffineCosts( 1, costs );
align = new Alignment( AlignmentType.GLOBAL, costs );
System.out.println( align.getAlignment( seq1, seq2 ) );


REnvironment: Connection to R

In this subsection, we show how to access R (cf. http://www.r-project.org/) from Jstacs. R is a project for statistical computing that allows for performing complex computations and creating nice plots.

In some cases, it is reasonable to use R from within Jstacs. To do so, we have to create a connection to R. We utilize the package Rserve (cf. http://www.rforge.net/Rserve/) of R that allows to communicate between Java and R. Having a running instance of Rserve, we can create a connection via


REnvironment re = new REnvironment();


However, in some cases we have to specify the login name, a password, and the port for the communication which is possible via alternative constructors.

Now, we are able to do diverse things in R. Here, we only present three methods, but REnvironment provides more functionality. First, we copy an array of doubles from Java to R


re.createVector( "values", values );


and second, we modify it


re.voidEval( "values=rnorm(length(values));" );


Finally, the REnvironment allows to create plots as PDF, TeX, or BufferedImage.


re.plotToPDF( "plot(values,t=\"l\");", "values.pdf", true );


ArrayHandler: Handling arrays

In this subsection, we present a way to easily handle arrays in Java, i.e., to cast, clone, and create arrays with elements of generic type. To this end, we implement the class ArrayHandler in Jstacs.

Let's assume we have a two dimensional array of either primitives of some Java class and we like to create a deep clone as it is necessary for member fields in clone methods.


double[][] twodim = new double[5][5];


Traditionally, we would have to implement for-loops to do so. However, the ArrayHandler implements this functionality in a generic manner providing one method for this purpose.


double[][] clone = ArrayHandler.clone( twodim );


A second use case, is the creation of arrays, where each and every entry is a clone of some instance.


TrainableStatisticalModel pwm = TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 );
TrainableStatisticalModel[] models = ArrayHandler.createArrayOf( pwm, 10 );


The third use case is to cast an array. Even if all elements of the array are from the same class, the component type of the array might be different (some super class). A simple cast will fail in those cases. However, the ArrayHandler provides two methods for casting arrays. Here, we present the more important method, which allows to specify the array component type and performs the cast operation.


Object[] m = new Object[]{
    TrainableStatisticalModelFactory.createPWM( DNAAlphabetContainer.SINGLETON, 10, 4.0 ),
    TrainableStatisticalModelFactory.createHomogeneousMarkovModel( DNAAlphabetContainer.SINGLETON, 40.0, (byte)0 )
};
TrainableStatisticalModel[] sms = ArrayHandler.cast( TrainableStatisticalModel.class, models );


ToolBox

The class ToolBox contains several static methods for recurring tasks. For example, you can compute the maximum of an array of doubles

double max = ToolBox.max( values );

or the sum of the values

double sum = ToolBox.sum( values );

or you can obtain the index of the first maximum value in the provided array

int maxIndex = ToolBox.getMaxIndex( values );


Normalisation

Another frequently needed functionality is the handling of log-values. Assume that we have an array values containing a number of log-probabilities [math]l_i[/math]. What we want to compute is the logarithm of the sum of the original probabilities, i.e., [math]\log(\sum_i \exp(l_i))[/math]. The naive computation of this sum often results in numerical problems, especially, if the original probabilities are very different. A more exact solution is provided by the static method getLogSum of the class Normalisation, which can be accessed by calling

double logSum = Normalisation.getLogSum( values );

Of course, this method does not only work for probabilities, but for general log-values.

Sometimes, we also want to normalize the given probabilities. That means, given the log-probabilities [math]l_i[/math], we want to obtain normalized probabilities [math]p_i = \frac{\exp(l_i)}{\sum_j \exp(l_j)}[/math]. This normalization is performed by calling

Normalisation.logSumNormalisation( values );

and after the call, the array values contains the normalized probabilities (not log-probabilities!).

Finally, we might want to do the same for probabilities, i.e. given probabilities [math]q_i[/math] in an array values, we want to compute [math]p_i = \frac{q_i}{\sum_j q_j}[/math] using

Normalisation.sumNormalisation( values );

A typical application for the last two methods are (log) joint probabilities that we want use to compute conditional probabilities by dividing by a marginal probability.

Goodies

The class SafeOutputStream is a simple way to switch between writing outputs of a program to standard out, to a file, or to completely suppress output. This class is basically a wrapper for other output streams that can handle null values. You can create a SafeOutputStream writing to standard out with

OutputStream stream = SafeOutputStream.getSafeOutputStream( System.out );

If you provided null to the factory method instead, output would be suppressed, while no modifications in code using this SafeOutputStream †would be necessary.

Finally, the class SubclassFinder can be used to search for all subclasses of a given class in a specified package and its sub-packages. For example, if we want to find all concrete sub-classes of TrainableStatisticalModel, i.e., classes that are not abstract and can be instantiated, in all sub-packages of de.jstacs, we call

LinkedList<Class<? extends TrainableStatisticalModel>> list = SubclassFinder.findInstantiableSubclasses( TrainableStatisticalModel.class, "de.jstacs" );

and obtain a linked list containing all such classes. Other methods in SubclassFinder allow for searching for general sub-types including interfaces and abstract classes, or for filtering the results by further required interfaces.