|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM
de.jstacs.sequenceScores.statisticalModels.trainable.hmm.models.HigherOrderHMM
public class HigherOrderHMM
This class implements a higher order hidden Markov model.
Currently, the modeling of the transitions is higher order, but is easily possible to extend this to emissions.
This implementation allows to have a set of final states .
A state is denoted final states if it is allowed at the end of a path. Hence, any valid path always ends with a final state.
Using the method
AbstractTrainableStatisticalModel.getLogProbFor(Sequence)
for sequence returns the value
Nested Class Summary | |
---|---|
protected static class |
HigherOrderHMM.Type
This enum defined different types of computations that will be done using the backward algorithm. |
Field Summary | |
---|---|
protected double[] |
backwardIntermediate
Helper variable = only for internal use. |
protected int[] |
container
Helper variable = only for internal use. |
protected double[] |
logEmission
Helper variable = only for internal use. |
protected int[][] |
numberOfSummands
Helper variable = only for internal use. |
protected boolean |
skipInit
Indicates if the model should be initialized (randomly) before optimization |
protected IntList |
stateList
Helper variable = only for internal use. |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM |
---|
bwdMatrix, emission, emissionIdx, finalState, forward, fwdMatrix, name, sostream, START_NODE, states, threads, trainingParameter, transition |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
alphabets, length |
Constructor Summary | |
---|---|
HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
Emission[] emission,
BasicHigherOrderTransition.AbstractTransitionElement... te)
This is a convenience constructor. |
|
HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission,
BasicHigherOrderTransition.AbstractTransitionElement... te)
This is the main constructor. |
|
HigherOrderHMM(StringBuffer xml)
The standard constructor for the interface Storable . |
Method Summary | |
---|---|
protected void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation. |
protected double |
baumWelch(int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the likelihood and modifies the sufficient statistics according to the Baum-Welch algorithm. |
HigherOrderHMM |
clone()
Follows the conventions of Object 's clone() -method. |
protected void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ... |
protected void |
createStates()
This method creates states for the internal usage. |
protected void |
estimateFromStatistics()
This method estimates the parameters of all emissions and the transition using their sufficient statistics. |
protected void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation. |
protected void |
fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the backward-matrix for a given sequence. |
protected void |
fillBwdOrViterbiMatrix(HigherOrderHMM.Type t,
int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the entries of the backward or the viterbi matrix. |
protected void |
fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the forward-matrix for a given sequence. |
protected void |
fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
This method fills the log state posterior of Sequence seq in a given matrix. |
protected void |
finalize()
|
ResultSet |
getCharacteristics()
Returns some information characterizing or describing the current instance. |
String |
getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ... |
double |
getLogPriorTerm()
Returns a value that is proportional to the log of the prior. |
double |
getLogProbForPath(IntList path,
int startPos,
Sequence seq)
|
double[] |
getLogScoreFor(DataSet data)
This method computes the logarithm of the scores of all sequences in the given sample. |
void |
getLogScoreFor(DataSet data,
double[] res)
This method computes and stores the logarithm of the scores for any sequence in the sample in the given double -array. |
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible. |
NumericalResultSet |
getNumericalCharacteristics()
Returns the subset of numerical values that are also returned by SequenceScore.getCharacteristics() . |
Pair<IntList,Double> |
getViterbiPathFor(int startPos,
int endPos,
Sequence seq)
|
protected String |
getXMLTag()
Returns the tag for the XML representation. |
protected void |
initialize(DataSet data,
double[] weight)
This method initializes all emissions and the transition. |
protected void |
initializeRandomly()
This method initializes all emissions and the transition randomly. |
boolean |
isInitialized()
This method can be used to determine whether the instance is initialized. |
protected void |
resetStatistics()
This method resets all sufficient statistics of all emissions and the transition. |
void |
samplePath(IntList path,
int startPos,
int endPos,
Sequence seq)
This method samples a valid path for the given sequence seq using the internal parameters. |
void |
train(DataSet data,
double[] weights)
Trains the TrainableStatisticalModel object given the data as DataSet using
the specified weights. |
protected double |
viterbi(IntList path,
int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the viterbi score of a given sequence seq . |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM |
---|
createMatrixForStatePosterior, decodePath, decodeStatePosterior, determineFinalStates, fromXML, getFinalStatePosterioriMatrix, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getLogProbFor, getLogStatePosteriorMatrixFor, getLogStatePosteriorMatrixFor, getNumberOfStates, getNumberOfThreads, getRunTimeException, getStatePosteriorMatrixFor, getStatePosteriorMatrixFor, getViterbiPathFor, getViterbiPathsFor, initTransition, logProb, provideMatrix, setOutputStream, toString, toXML, train |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
check, emitDataSet, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected int[] container
Transition.fillTransitionInformation(int, int, int, int[])
.
protected double[] logEmission
AbstractHMM.emission
protected double[] backwardIntermediate
numberOfSummands
protected int[][] numberOfSummands
protected IntList stateList
samplePath(IntList, int, int, Sequence)
.
protected boolean skipInit
Constructor Detail |
---|
public HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, Emission[] emission, BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
i
used emission i
on the forward strand.
trainingParameterSet
- the ParameterSet
that determines the training algorithm and contains the necessary Parameter
sname
- the names of the statesemission
- the emissionste
- the BasicHigherOrderTransition.AbstractTransitionElement
s building a transition
Exception
- if
name, emissionIdx,
or forward
is not equal to the number of statesAlphabetContainer
HigherOrderHMM(HMMTrainingParameterSet, String[], int[], boolean[], Emission[], de.jstacs.sequenceScores.statisticalModels.trainable.hmm.transitions.BasicHigherOrderTransition.AbstractTransitionElement...)
public HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, Emission[] emission, BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
trainingParameterSet
- the ParameterSet
that determines the training algorithm and contains the necessary Parameter
sname
- the names of the statesemissionIdx
- the indices of the emissions that should be used for each state, if null
state i
will use emission i
forward
- a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null
all states use the forward strandemission
- the emissionste
- the BasicHigherOrderTransition.AbstractTransitionElement
s building a transition
Exception
- if
name, emissionIdx,
or forward
is not equal to the number of statesAlphabetContainer
public HigherOrderHMM(StringBuffer xml) throws NonParsableException
Storable
.
Constructs an HigherOrderHMM
out of an XML representation.
xml
- the XML representation as StringBuffer
NonParsableException
- if the HigherOrderHMM
could not be reconstructed out of
the StringBuffer
xml
Method Detail |
---|
protected void createHelperVariables()
AbstractHMM
createHelperVariables
in class AbstractHMM
protected String getXMLTag()
AbstractHMM
getXMLTag
in class AbstractHMM
AbstractHMM.fromXML(StringBuffer)
,
AbstractHMM.toXML()
protected void appendFurtherInformation(StringBuffer xml)
AbstractHMM
appendFurtherInformation
in class AbstractHMM
xml
- the XML representationprotected void extractFurtherInformation(StringBuffer xml) throws NonParsableException
extractFurtherInformation
in class AbstractHMM
xml
- the XML representation
NonParsableException
- if the information could not be reconstructed out of the StringBuffer
xml
public HigherOrderHMM clone() throws CloneNotSupportedException
AbstractTrainableStatisticalModel
Object
's clone()
-method.
clone
in interface SequenceScore
clone
in interface TrainableStatisticalModel
clone
in class AbstractHMM
AbstractTrainableStatisticalModel
(the member-AlphabetContainer
isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X
directly inherited from
AbstractTrainableStatisticalModel
. Hence X
's
clone()
-method should work as:Object o = (X)super.clone();
o
defined by
X
that are not of simple data-types like
int
, double
, ... have to be deeply
copied return o
CloneNotSupportedException
- if something went wrong while cloningprotected void createStates()
AbstractHMM
createStates
in class AbstractHMM
public double getLogPriorTerm()
StatisticalModel
public double getLogProbForPath(IntList path, int startPos, Sequence seq) throws Exception
getLogProbForPath
in class AbstractHMM
path
- the given state pathstartPos
- the start position within the sequence(s) (inclusive)seq
- the sequence(s)
Exception
- if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...protected void fillLogStatePosteriorMatrix(double[][] statePosterior, int startPos, int endPos, Sequence seq, boolean silentZero) throws Exception
AbstractHMM
seq
in a given matrix.
fillLogStatePosteriorMatrix
in class AbstractHMM
statePosterior
- the matrix for the log state posteriorstartPos
- the start positionendPos
- the end positionseq
- the sequencesilentZero
- true
if the state posterior for silent states is defined to be zero, otherwise false
Exception
- if an error occurs during the computationAbstractHMM.getLogStatePosteriorMatrixFor(int, int, Sequence)
,
AbstractHMM.createMatrixForStatePosterior(int, int)
protected void fillFwdMatrix(int startPos, int endPos, Sequence seq) throws OperationNotSupportedException, WrongLengthException
AbstractHMM
fillFwdMatrix
in class AbstractHMM
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequence
OperationNotSupportedException
WrongLengthException
protected void fillBwdMatrix(int startPos, int endPos, Sequence seq) throws Exception
AbstractHMM
fillBwdMatrix
in class AbstractHMM
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequence
Exception
- if some error occurs during the computationprotected void fillBwdOrViterbiMatrix(HigherOrderHMM.Type t, int startPos, int endPos, double weight, Sequence seq) throws Exception
t
- a switch to decide which computation modestartPos
- start position of the sequenceendPos
- end position of the sequenceweight
- the given external weight of the sequence (only used for Baum-Welch)seq
- the sequence
Exception
- forwarded from TrainableState.addToStatistic(int, int, double, de.jstacs.data.sequences.Sequence)
and State.getLogScoreFor(int, int, Sequence)
public Pair<IntList,Double> getViterbiPathFor(int startPos, int endPos, Sequence seq) throws Exception
getViterbiPathFor
in class AbstractHMM
startPos
- the start position within the sequenceendPos
- the end position within the sequenceseq
- the sequence
Pair
containing the viterbi state path and the corresponding score
Exception
- if the viterbi path could not be computed, for instance if the model is not trained, ...protected double viterbi(IntList path, int startPos, int endPos, double weight, Sequence seq) throws Exception
seq
.
Furthermore, it allows either to modify the sufficient statistics according
to the viterbi training algorithm or to compute the viterbi path, which will
in this case be returned in path
.
path
- if null
viterbi training, otherwise computation of the viterbi pathstartPos
- the start positionendPos
- the end positionweight
- the sequence weight, in most cases this is 1seq
- the sequence
Exception
- an error occurs during the computationprotected double baumWelch(int startPos, int endPos, double weight, Sequence seq) throws Exception
startPos
- the start positionendPos
- the end positionweight
- the sequence weight, in most cases this is 1seq
- the sequence
Exception
- an error occurs during the computationpublic void train(DataSet data, double[] weights) throws Exception
TrainableStatisticalModel
TrainableStatisticalModel
object given the data as DataSet
using
the specified weights. The weight at position i belongs to the element at
position i. So the array weight
should have the number of
sequences in the sample as dimension. (Optionally it is possible to use
weight == null
if all weights have the value one.)train(data1)
; train(data2)
should be a fully trained model over data2
and not over
data1+data2
. All parameters of the model were given by the
call of the constructor.
data
- the given sequences as DataSet
weights
- the weights of the elements, each weight should be
non-negative
Exception
- if the training did not succeed (e.g. the dimension of
weights
and the number of sequences in the
sample do not match)DataSet.getElementAt(int)
,
DataSet.ElementEnumerator
protected void initialize(DataSet data, double[] weight) throws Exception
initializeRandomly()
.
data
- the data setweight
- the weights for each sequence of the data set
Exception
- if an error occurs during the initializationprotected void initializeRandomly()
protected void resetStatistics()
protected void estimateFromStatistics()
public final byte getMaximalMarkovOrder() throws UnsupportedOperationException
StatisticalModel
getMaximalMarkovOrder
in interface StatisticalModel
getMaximalMarkovOrder
in class AbstractTrainableStatisticalModel
UnsupportedOperationException
- if the model can't give a proper answerpublic ResultSet getCharacteristics() throws Exception
SequenceScore
StorableResult
.
getCharacteristics
in interface SequenceScore
getCharacteristics
in class AbstractTrainableStatisticalModel
Exception
- if some of the characteristics could not be definedStorableResult
public String getInstanceName()
SequenceScore
public double[] getLogScoreFor(DataSet data) throws Exception
SequenceScore
SequenceScore.getLogScoreFor(Sequence)
.
getLogScoreFor
in interface SequenceScore
getLogScoreFor
in class AbstractTrainableStatisticalModel
data
- the sample of sequences
Exception
- if something went wrongSequenceScore.getLogScoreFor(Sequence)
public void getLogScoreFor(DataSet data, double[] res) throws Exception
SequenceScore
double
-array.
SequenceScore.getLogScoreFor(Sequence)
.
getLogScoreFor
in interface SequenceScore
getLogScoreFor
in class AbstractTrainableStatisticalModel
data
- the sample of sequencesres
- the array for the results, has to have length
data.getNumberOfElements()
(which returns the
number of sequences in the sample)
Exception
- if something went wrongSequenceScore.getLogScoreFor(Sequence)
,
SequenceScore.getLogScoreFor(DataSet)
public NumericalResultSet getNumericalCharacteristics() throws Exception
SequenceScore
SequenceScore.getCharacteristics()
.
Exception
- if some of the characteristics could not be definedpublic boolean isInitialized()
SequenceScore
SequenceScore.getLogScoreFor(Sequence)
.
true
if the instance is initialized, false
otherwiseprotected void finalize() throws Throwable
finalize
in class AbstractHMM
Throwable
public void samplePath(IntList path, int startPos, int endPos, Sequence seq) throws Exception
seq
using the internal parameters.
path
- an IntList
containing the path after using this methodstartPos
- the start positionendPos
- the end positionseq
- the sequence
Exception
- if an error occurs during computation
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |