public class HigherOrderHMM extends AbstractHMM
AbstractTrainableStatisticalModel.getLogProbFor(Sequence)
for sequence Modifier and Type | Class and Description |
---|---|
protected static class |
HigherOrderHMM.Type
This enum defined different types of computations that will be done using the backward algorithm.
|
Modifier and Type | Field and Description |
---|---|
protected double[] |
backwardIntermediate
Helper variable = only for internal use.
|
protected int[] |
container
Helper variable = only for internal use.
|
protected double[] |
logEmission
Helper variable = only for internal use.
|
protected int[][] |
numberOfSummands
Helper variable = only for internal use.
|
protected boolean |
skipInit
Indicates if the model should be initialized (randomly) before optimization
|
protected IntList |
stateList
Helper variable = only for internal use.
|
bwdMatrix, emission, emissionIdx, finalState, forward, fwdMatrix, name, sostream, START_NODE, states, threads, trainingParameter, transition
alphabets, length
Constructor and Description |
---|
HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
Emission[] emission,
BasicHigherOrderTransition.AbstractTransitionElement... te)
This is a convenience constructor.
|
HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission,
BasicHigherOrderTransition.AbstractTransitionElement... te)
This is the main constructor.
|
HigherOrderHMM(StringBuffer xml)
The standard constructor for the interface
Storable . |
Modifier and Type | Method and Description |
---|---|
protected void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation.
|
protected double |
baumWelch(int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the likelihood and modifies the sufficient statistics according to the Baum-Welch algorithm.
|
HigherOrderHMM |
clone()
Follows the conventions of
Object 's clone() -method. |
protected void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ...
|
protected void |
createStates()
This method creates states for the internal usage.
|
protected void |
estimateFromStatistics()
This method estimates the parameters of all emissions and the transition using their sufficient statistics.
|
protected void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation.
|
protected void |
fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the backward-matrix for a given sequence.
|
protected void |
fillBwdOrViterbiMatrix(HigherOrderHMM.Type t,
int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the entries of the backward or the viterbi matrix.
|
protected void |
fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the forward-matrix for a given sequence.
|
protected void |
fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
This method fills the log state posterior of Sequence
seq in a given matrix. |
protected void |
finalize() |
ResultSet |
getCharacteristics()
Returns some information characterizing or describing the current
instance.
|
int[] |
getEmissionIndexes()
Returns a clone of the internal array of emission indexes that represent which emission is used in which state.
|
Emission[] |
getEmissions()
Returns a clone of the internal emissions.
|
String |
getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...
|
double |
getLogPriorTerm()
Returns a value that is proportional to the log of the prior.
|
double |
getLogProbForPath(IntList path,
int startPos,
Sequence seq) |
double[] |
getLogScoreFor(DataSet data)
This method computes the logarithm of the scores of all sequences
in the given data set.
|
void |
getLogScoreFor(DataSet data,
double[] res)
This method computes and stores the logarithm of the scores for
any sequence in the data set in the given
double -array. |
byte |
getMaximalMarkovOrder()
This method returns the maximal used Markov order, if possible.
|
String[] |
getNames()
Returns a clone of the state names.
|
NumericalResultSet |
getNumericalCharacteristics()
Returns the subset of numerical values that are also returned by
SequenceScore.getCharacteristics() . |
HMMTrainingParameterSet |
getTrainingParams()
Returns a clone of the training parameters
|
TransitionElement[] |
getTransisionElements()
Returns the transition elements of the internal
Transition . |
Pair<IntList,Double> |
getViterbiPathFor(int startPos,
int endPos,
Sequence seq) |
protected String |
getXMLTag()
Returns the tag for the XML representation.
|
protected void |
initialize(DataSet data,
double[] weight)
This method initializes all emissions and the transition.
|
void |
initializeRandomly()
This method initializes all emissions and the transition randomly.
|
boolean |
isInitialized()
This method can be used to determine whether the instance is initialized.
|
protected void |
resetStatistics()
This method resets all sufficient statistics of all emissions and the transition.
|
void |
samplePath(IntList path,
int startPos,
int endPos,
Sequence seq)
This method samples a valid path for the given sequence
seq using the internal parameters. |
void |
setSkiptInit(boolean skip)
Sets if the model should be initialized (randomly) before optimization
|
void |
train(DataSet data,
double[] weights)
Trains the
TrainableStatisticalModel object given the data as DataSet using
the specified weights. |
protected double |
viterbi(IntList path,
int startPos,
int endPos,
double weight,
Sequence seq)
This method computes the viterbi score of a given sequence
seq . |
createMatrixForStatePosterior, decodePath, decodeStatePosterior, determineFinalStates, fromXML, getFinalStatePosterioriMatrix, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getGraphvizRepresentation, getLogProbFor, getLogStatePosteriorMatrixFor, getLogStatePosteriorMatrixFor, getNumberOfStates, getNumberOfThreads, getRunTimeException, getStatePosteriorMatrixFor, getStatePosteriorMatrixFor, getViterbiPathFor, getViterbiPathsFor, initTransition, logProb, provideMatrix, setOutputStream, toString, toXML, train
check, emitDataSet, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, toString
protected int[] container
Transition.fillTransitionInformation(int, int, int, int[])
.protected double[] logEmission
AbstractHMM.emission
protected double[] backwardIntermediate
numberOfSummands
protected int[][] numberOfSummands
protected IntList stateList
samplePath(IntList, int, int, Sequence)
.protected boolean skipInit
public HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, Emission[] emission, BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
i
used emission i
on the forward strand.trainingParameterSet
- the ParameterSet
that determines the training algorithm and contains the necessary Parameter
sname
- the names of the statesemission
- the emissionste
- the BasicHigherOrderTransition.AbstractTransitionElement
s building a transitionException
- if
name, emissionIdx,
or forward
is not equal to the number of statesAlphabetContainer
HigherOrderHMM(HMMTrainingParameterSet, String[], int[], boolean[], Emission[], de.jstacs.sequenceScores.statisticalModels.trainable.hmm.transitions.BasicHigherOrderTransition.AbstractTransitionElement...)
public HigherOrderHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, Emission[] emission, BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
trainingParameterSet
- the ParameterSet
that determines the training algorithm and contains the necessary Parameter
sname
- the names of the statesemissionIdx
- the indices of the emissions that should be used for each state, if null
state i
will use emission i
forward
- a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null
all states use the forward strandemission
- the emissionste
- the BasicHigherOrderTransition.AbstractTransitionElement
s building a transitionException
- if
name, emissionIdx,
or forward
is not equal to the number of statesAlphabetContainer
public HigherOrderHMM(StringBuffer xml) throws NonParsableException
Storable
.
Constructs an HigherOrderHMM
out of an XML representation.xml
- the XML representation as StringBuffer
NonParsableException
- if the HigherOrderHMM
could not be reconstructed out of
the StringBuffer
xml
protected void createHelperVariables()
AbstractHMM
createHelperVariables
in class AbstractHMM
protected String getXMLTag()
AbstractHMM
getXMLTag
in class AbstractHMM
AbstractHMM.fromXML(StringBuffer)
,
AbstractHMM.toXML()
protected void appendFurtherInformation(StringBuffer xml)
AbstractHMM
appendFurtherInformation
in class AbstractHMM
xml
- the XML representationprotected void extractFurtherInformation(StringBuffer xml) throws NonParsableException
extractFurtherInformation
in class AbstractHMM
xml
- the XML representationNonParsableException
- if the information could not be reconstructed out of the StringBuffer
xml
public HigherOrderHMM clone() throws CloneNotSupportedException
AbstractTrainableStatisticalModel
Object
's clone()
-method.clone
in interface SequenceScore
clone
in interface TrainableStatisticalModel
clone
in class AbstractHMM
AbstractTrainableStatisticalModel
(the member-AlphabetContainer
isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X
directly inherited from
AbstractTrainableStatisticalModel
. Hence X
's
clone()
-method should work as:Object o = (X)super.clone();
o
defined by
X
that are not of simple data-types like
int
, double
, ... have to be deeply
copied return o
CloneNotSupportedException
- if something went wrong while cloningprotected void createStates()
AbstractHMM
createStates
in class AbstractHMM
public double getLogPriorTerm()
StatisticalModel
public double getLogProbForPath(IntList path, int startPos, Sequence seq) throws Exception
getLogProbForPath
in class AbstractHMM
path
- the given state pathstartPos
- the start position within the sequence(s) (inclusive)seq
- the sequence(s)Exception
- if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...protected void fillLogStatePosteriorMatrix(double[][] statePosterior, int startPos, int endPos, Sequence seq, boolean silentZero) throws Exception
AbstractHMM
seq
in a given matrix.fillLogStatePosteriorMatrix
in class AbstractHMM
statePosterior
- the matrix for the log state posteriorstartPos
- the start positionendPos
- the end positionseq
- the sequencesilentZero
- true
if the state posterior for silent states is defined to be zero, otherwise false
Exception
- if an error occurs during the computationAbstractHMM.getLogStatePosteriorMatrixFor(int, int, Sequence)
,
AbstractHMM.createMatrixForStatePosterior(int, int)
protected void fillFwdMatrix(int startPos, int endPos, Sequence seq) throws OperationNotSupportedException, WrongLengthException
AbstractHMM
fillFwdMatrix
in class AbstractHMM
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequenceOperationNotSupportedException
WrongLengthException
protected void fillBwdMatrix(int startPos, int endPos, Sequence seq) throws Exception
AbstractHMM
fillBwdMatrix
in class AbstractHMM
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequenceException
- if some error occurs during the computationprotected void fillBwdOrViterbiMatrix(HigherOrderHMM.Type t, int startPos, int endPos, double weight, Sequence seq) throws Exception
t
- a switch to decide which computation modestartPos
- start position of the sequenceendPos
- end position of the sequenceweight
- the given external weight of the sequence (only used for Baum-Welch)seq
- the sequenceException
- forwarded from TrainableState.addToStatistic(int, int, double, de.jstacs.data.sequences.Sequence)
and State.getLogScoreFor(int, int, Sequence)
public Pair<IntList,Double> getViterbiPathFor(int startPos, int endPos, Sequence seq) throws Exception
getViterbiPathFor
in class AbstractHMM
startPos
- the start position within the sequenceendPos
- the end position within the sequenceseq
- the sequencePair
containing the viterbi state path and the corresponding scoreException
- if the viterbi path could not be computed, for instance if the model is not trained, ...protected double viterbi(IntList path, int startPos, int endPos, double weight, Sequence seq) throws Exception
seq
.
Furthermore, it allows either to modify the sufficient statistics according
to the viterbi training algorithm or to compute the viterbi path, which will
in this case be returned in path
.path
- if null
viterbi training, otherwise computation of the viterbi pathstartPos
- the start positionendPos
- the end positionweight
- the sequence weight, in most cases this is 1seq
- the sequenceException
- an error occurs during the computationprotected double baumWelch(int startPos, int endPos, double weight, Sequence seq) throws Exception
startPos
- the start positionendPos
- the end positionweight
- the sequence weight, in most cases this is 1seq
- the sequenceException
- an error occurs during the computationpublic void train(DataSet data, double[] weights) throws Exception
TrainableStatisticalModel
TrainableStatisticalModel
object given the data as DataSet
using
the specified weights. The weight at position i belongs to the element at
position i. So the array weight
should have the number of
sequences in the data set as dimension. (Optionally it is possible to use
weight == null
if all weights have the value one.)train(data1)
; train(data2)
should be a fully trained model over data2
and not over
data1+data2
. All parameters of the model were given by the
call of the constructor.data
- the given sequences as DataSet
weights
- the weights of the elements, each weight should be
non-negativeException
- if the training did not succeed (e.g. the dimension of
weights
and the number of sequences in the
data set do not match)DataSet.getElementAt(int)
,
DataSet.ElementEnumerator
protected void initialize(DataSet data, double[] weight) throws Exception
initializeRandomly()
.data
- the data setweight
- the weights for each sequence of the data setException
- if an error occurs during the initializationpublic void setSkiptInit(boolean skip)
skip
- if the model should be initializedpublic void initializeRandomly()
protected void resetStatistics()
protected void estimateFromStatistics()
public final byte getMaximalMarkovOrder() throws UnsupportedOperationException
StatisticalModel
getMaximalMarkovOrder
in interface StatisticalModel
getMaximalMarkovOrder
in class AbstractTrainableStatisticalModel
UnsupportedOperationException
- if the model can't give a proper answerpublic ResultSet getCharacteristics() throws Exception
SequenceScore
StorableResult
.getCharacteristics
in interface SequenceScore
getCharacteristics
in class AbstractTrainableStatisticalModel
Exception
- if some of the characteristics could not be definedStorableResult
public String getInstanceName()
SequenceScore
public double[] getLogScoreFor(DataSet data) throws Exception
SequenceScore
SequenceScore.getLogScoreFor(Sequence)
.getLogScoreFor
in interface SequenceScore
getLogScoreFor
in class AbstractTrainableStatisticalModel
data
- the data set of sequencesException
- if something went wrongSequenceScore.getLogScoreFor(Sequence)
public void getLogScoreFor(DataSet data, double[] res) throws Exception
SequenceScore
double
-array.
SequenceScore.getLogScoreFor(Sequence)
.getLogScoreFor
in interface SequenceScore
getLogScoreFor
in class AbstractTrainableStatisticalModel
data
- the data set of sequencesres
- the array for the results, has to have length
data.getNumberOfElements()
(which returns the
number of sequences in the data set)Exception
- if something went wrongSequenceScore.getLogScoreFor(Sequence)
,
SequenceScore.getLogScoreFor(DataSet)
public NumericalResultSet getNumericalCharacteristics() throws Exception
SequenceScore
SequenceScore.getCharacteristics()
.Exception
- if some of the characteristics could not be definedpublic boolean isInitialized()
SequenceScore
SequenceScore.getLogScoreFor(Sequence)
.true
if the instance is initialized, false
otherwiseprotected void finalize() throws Throwable
finalize
in class AbstractHMM
Throwable
public void samplePath(IntList path, int startPos, int endPos, Sequence seq) throws Exception
seq
using the internal parameters.public Emission[] getEmissions() throws CloneNotSupportedException
CloneNotSupportedException
- if the emissions could not be clonedpublic TransitionElement[] getTransisionElements() throws CloneNotSupportedException
Transition
.CloneNotSupportedException
- if the transition elements could not be clonedHigherOrderTransition.getTransisionElements()
public int[] getEmissionIndexes()
public String[] getNames()
public HMMTrainingParameterSet getTrainingParams() throws CloneNotSupportedException
CloneNotSupportedException
- if the parameters could not be cloned