|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.hmm.AbstractHMM
public abstract class AbstractHMM
This class is the super class of all implementations hidden Markov models (HMMs) in Jstacs.
The training algorithm of the the HMM is determined by a specialized ParameterSet
denoted as HMMTrainingParameterSet
.
State
,
Transition
Field Summary | |
---|---|
protected double[][] |
bwdMatrix
matrix for all backward-computed variables; bwdMatrix[l][c] = log P(x_{l+1},... |
protected Emission[] |
emission
The emissions used in the states. |
protected int[] |
emissionIdx
The index of the used emission of each state. |
protected boolean[] |
finalState
An array of switches that contains for each state whether is is a final state or not (cf. |
protected boolean[] |
forward
An array of switches that contains for each state whether the emission is forward or the reverse strand. |
protected double[][] |
fwdMatrix
matrix for all forward-computed variables; fwdMatrix[l][c] = log P(x_1,... |
protected String[] |
name
The names of the states. |
protected SafeOutputStream |
sostream
This is the stream for writing information while training. |
static String |
START_NODE
The String for the start node used in Graphviz annotation. |
protected State[] |
states
The (hidden) states of the HMM. |
protected int |
threads
The number of threads that is internally used. |
protected HMMTrainingParameterSet |
trainingParameter
The ParameterSet containing all Parameter s for the training of the HMM. |
protected Transition |
transition
The transitions between all (hidden) states of the HMM. |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
alphabets, length |
Constructor Summary | |
---|---|
protected |
AbstractHMM(HMMTrainingParameterSet trainingParameterSet,
String[] name,
int[] emissionIdx,
boolean[] forward,
Emission[] emission)
This is the main constructor for an HMM. |
protected |
AbstractHMM(StringBuffer xml)
The standard constructor for the interface Storable . |
Method Summary | |
---|---|
protected abstract void |
appendFurtherInformation(StringBuffer xml)
This method appends further information to the XML representation. |
AbstractHMM |
clone()
Follows the conventions of Object 's clone() -method. |
protected abstract void |
createHelperVariables()
This method instantiates all helper variables that are need inside the model for instance for filling forward and backward matrix, ... |
protected double[][] |
createMatrixForStatePosterior(int startPos,
int endPos)
This method creates an empty matrix for the log state posterior. |
protected abstract void |
createStates()
This method creates states for the internal usage. |
String[] |
decodePath(IntList path)
This method decodes any path of the HMM, i.e. it converts the integer representation of the path in a String representation. |
static int[][] |
decodeStatePosterior(double[][]... statePosterior)
The method returns the decoded state posterior, i.e. a sequence of states. |
protected void |
determineFinalStates()
This method determines the final states of the HMM. |
protected abstract void |
extractFurtherInformation(StringBuffer xml)
This method extracts further information from the XML representation. |
protected abstract void |
fillBwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the backward-matrix for a given sequence. |
protected abstract void |
fillFwdMatrix(int startPos,
int endPos,
Sequence seq)
This method fills the forward-matrix for a given sequence. |
protected abstract void |
fillLogStatePosteriorMatrix(double[][] statePosterior,
int startPos,
int endPos,
Sequence seq,
boolean silentZero)
This method fills the log state posterior of Sequence seq in a given matrix. |
protected void |
finalize()
|
protected void |
fromXML(StringBuffer xml)
This method is used by the AbstractHMM(StringBuffer) constructor for creating an instance from an XML representation. |
protected double[][] |
getFinalStatePosterioriMatrix(double[][] intermediate)
This method is used if fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean) is used with code>silentZero==true
to eliminate the first row. |
String |
getGraphvizRepresentation(NumberFormat nf)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
boolean sameTypeSameRank)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
boolean sameTypeSameRank)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
String |
getGraphvizRepresentation(NumberFormat nf,
DataSet data,
double[] weight,
HashMap<String,String> rankPatterns)
This method returns a String representation of the structure that
can be used in Graphviz to create an image. |
double |
getLogProbFor(Sequence sequence,
int startpos,
int endpos)
Returns the logarithm of the probability of (a part of) the given sequence given the model. |
abstract double |
getLogProbForPath(IntList path,
int startPos,
Sequence seq)
|
double[][][] |
getLogStatePosteriorMatrixFor(DataSet data)
This method returns the log state posteriors for all sequences of the sample data . |
double[][] |
getLogStatePosteriorMatrixFor(int startPos,
int endPos,
Sequence seq)
This method returns the log state posterior of all states for a sequence. |
int |
getNumberOfStates()
This method returns the number of the (hidden) states |
int |
getNumberOfThreads()
This method returns the number of threads that is internally used. |
protected static RuntimeException |
getRunTimeException(Exception e)
This method creates an RuntimeException from any other Exception |
double[][][] |
getStatePosteriorMatrixFor(DataSet data)
This method returns the state posteriors for all sequences of the sample data . |
double[][] |
getStatePosteriorMatrixFor(Sequence seq)
This method returns the log state posterior of all states for a sequence. |
abstract Pair<IntList,Double> |
getViterbiPathFor(int startPos,
int endPos,
Sequence seq)
|
Pair<IntList,Double> |
getViterbiPathFor(Sequence seq)
|
Pair<IntList,Double>[] |
getViterbiPathsFor(DataSet data)
This method returns the viterbi paths and scores for all sequences of the sample data . |
protected abstract String |
getXMLTag()
Returns the tag for the XML representation. |
protected void |
initTransition(BasicHigherOrderTransition.AbstractTransitionElement... te)
This method creates the internal transition. |
protected double |
logProb(int startpos,
int endpos,
Sequence sequence)
This method computes the logarithm of the probability of the corresponding subsequences. |
protected void |
provideMatrix(int type,
int length)
This method invokes the method createHelperVariables() and provides the matrix with given type. |
void |
setOutputStream(OutputStream o)
Sets the OutputStream that is used e.g. for writing information
while training. |
String |
toString()
Should give a simple representation (text) of the model as String . |
StringBuffer |
toXML()
This method returns an XML representation as StringBuffer of an
instance of the implementing class. |
void |
train(DataSet data)
Trains the TrainableStatisticalModel object given the data as DataSet . |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
check, emitDataSet, getAlphabetContainer, getCharacteristics, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel |
---|
train |
Methods inherited from interface de.jstacs.sequenceScores.statisticalModels.StatisticalModel |
---|
getLogPriorTerm |
Methods inherited from interface de.jstacs.sequenceScores.SequenceScore |
---|
getInstanceName, getNumericalCharacteristics, isInitialized |
Field Detail |
---|
protected State[] states
protected String[] name
protected int[] emissionIdx
protected boolean[] forward
ComplementableDiscreteAlphabet
protected Emission[] emission
protected Transition transition
protected double[][] fwdMatrix
protected double[][] bwdMatrix
protected HMMTrainingParameterSet trainingParameter
ParameterSet
containing all Parameter
s for the training of the HMM.
protected SafeOutputStream sostream
protected boolean[] finalState
protected int threads
public static final String START_NODE
String
for the start node used in Graphviz annotation.
getGraphvizRepresentation(NumberFormat)
,
Constant Field ValuesConstructor Detail |
---|
protected AbstractHMM(HMMTrainingParameterSet trainingParameterSet, String[] name, int[] emissionIdx, boolean[] forward, Emission[] emission) throws CloneNotSupportedException, WrongAlphabetException
trainingParameterSet
- a ParameterSet
containing all Parameter
s for the training of the HMMname
- the names of the statesemissionIdx
- the indices of the emissions that should be used for each state, if null
state i
will use emission i
forward
- a boolean array that indicates whether the symbol on the forward or the reverse complementary strand should be used,
if null
all states use the forward strandemission
- the emissions
CloneNotSupportedException
- if trainingParameterSet
can not be cloned
WrongAlphabetException
- if not all (non-silent) emissions have use the same AlphabetContainer
protected AbstractHMM(StringBuffer xml) throws NonParsableException
Storable
.
Constructs a AbstractHMM
out of an XML representation.
xml
- the XML representation as StringBuffer
NonParsableException
- if the AbstractHMM
could not be reconstructed out of
the StringBuffer
xml
Method Detail |
---|
protected void initTransition(BasicHigherOrderTransition.AbstractTransitionElement... te) throws Exception
te
- the individual transition elements
Exception
- if the transition can not handle the current statesprotected abstract String getXMLTag()
fromXML(StringBuffer)
,
toXML()
public StringBuffer toXML()
Storable
StringBuffer
of an
instance of the implementing class.
toXML
in interface Storable
protected void fromXML(StringBuffer xml) throws NonParsableException
AbstractHMM(StringBuffer)
constructor for creating an instance from an XML representation.
This method should never be made public
.
fromXML
in class AbstractTrainableStatisticalModel
xml
- the XML representation
NonParsableException
- if the XML representation can not be parsed properlyAbstractTrainableStatisticalModel.AbstractTrainableStatisticalModel(StringBuffer)
protected abstract void appendFurtherInformation(StringBuffer xml)
xml
- the XML representationprotected abstract void extractFurtherInformation(StringBuffer xml) throws NonParsableException
xml
- the XML representation
NonParsableException
- if the information could not be reconstructed out of the StringBuffer
xml
public AbstractHMM clone() throws CloneNotSupportedException
AbstractTrainableStatisticalModel
Object
's clone()
-method.
clone
in interface SequenceScore
clone
in interface TrainableStatisticalModel
clone
in class AbstractTrainableStatisticalModel
AbstractTrainableStatisticalModel
(the member-AlphabetContainer
isn't deeply cloned since
it is assumed to be immutable). The type of the returned object
is defined by the class X
directly inherited from
AbstractTrainableStatisticalModel
. Hence X
's
clone()
-method should work as:Object o = (X)super.clone();
o
defined by
X
that are not of simple data-types like
int
, double
, ... have to be deeply
copied return o
CloneNotSupportedException
- if something went wrong while cloningprotected abstract void createStates()
protected abstract void fillFwdMatrix(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequence
Exception
- if some error occurs during the computationprotected abstract void fillBwdMatrix(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position (inclusive) in the sequenceendPos
- the end position (inclusive) in the sequenceseq
- the sequence
Exception
- if some error occurs during the computationpublic int getNumberOfThreads()
public String getGraphvizRepresentation(NumberFormat nf)
String
representation of the structure that
can be used in Graphviz to create an image.
nf
- an instance of NumberFormat
for formating the probabilities of the transition
String
representation of the structuregetGraphvizRepresentation(NumberFormat, DataSet, double[], boolean)
public String getGraphvizRepresentation(NumberFormat nf, boolean sameTypeSameRank)
String
representation of the structure that
can be used in Graphviz to create an image.
nf
- an instance of NumberFormat
for formating the probabilities of the transitionsameTypeSameRank
- if true
, states of the same type, i.e., having the same type of emission, are displayed on the same rank
String
representation of the structuregetGraphvizRepresentation(NumberFormat, DataSet, double[], boolean)
public String getGraphvizRepresentation(NumberFormat nf, DataSet data, double[] weight, boolean sameTypeSameRank)
String
representation of the structure that
can be used in Graphviz to create an image.
nf
- an instance of NumberFormat
for formating the probabilities of the transitiondata
- the data to determine the state posterior; can be null
weight
- the weights to weight the determined state posterior; can be null
sameTypeSameRank
- if true
, states of the same type, i.e., having the same type of emission, are displayed on the same rank
String
representation of the structurepublic String getGraphvizRepresentation(NumberFormat nf, DataSet data, double[] weight, HashMap<String,String> rankPatterns)
String
representation of the structure that
can be used in Graphviz to create an image.
nf
- an instance of NumberFormat
for formating the probabilities of the transitiondata
- the data to determine the state posterior; can be null
weight
- the weights to weight the determined state posterior; can be null
rankPatterns
- a HashMap
contain regular expressions and their corresponding value for the option rank
in Graphviz
String
representation of the structureHMMFactory.getHashMap()
protected double[][] createMatrixForStatePosterior(int startPos, int endPos)
startPos
- the start positionendPos
- the end position
getLogStatePosteriorMatrixFor(int, int, Sequence)
,
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean)
protected abstract void fillLogStatePosteriorMatrix(double[][] statePosterior, int startPos, int endPos, Sequence seq, boolean silentZero) throws Exception
seq
in a given matrix.
statePosterior
- the matrix for the log state posteriorstartPos
- the start positionendPos
- the end positionseq
- the sequencesilentZero
- true
if the state posterior for silent states is defined to be zero, otherwise false
Exception
- if an error occurs during the computationgetLogStatePosteriorMatrixFor(int, int, Sequence)
,
createMatrixForStatePosterior(int, int)
public double[][] getLogStatePosteriorMatrixFor(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position within the sequenceendPos
- the end position within the sequenceseq
- the sequence
Exception
- if the state posterior could not be computed, for instance if the model is not trained, ...protected double[][] getFinalStatePosterioriMatrix(double[][] intermediate)
fillLogStatePosteriorMatrix(double[][], int, int, Sequence, boolean)
is used with code>silentZero==true
to eliminate the first row.
intermediate
- the intermediate (log) state posterior matrix containing one additional row for silent states before the first emission
public double[][] getStatePosteriorMatrixFor(Sequence seq) throws Exception
seq
- the sequence
Exception
- if the state posterior could not be computed, for instance if the model is not trained, ...getLogStatePosteriorMatrixFor(int, int, Sequence)
public double[][][] getLogStatePosteriorMatrixFor(DataSet data) throws Exception
data
.
data
- the sequences
Exception
- if the state posterior could not be computed, for instance if the model is not trained, ...getLogStatePosteriorMatrixFor(int, int, Sequence)
public double[][][] getStatePosteriorMatrixFor(DataSet data) throws Exception
data
.
data
- the sequences
Exception
- if the state posterior could not be computed, for instance if the model is not trained, ...getStatePosteriorMatrixFor(Sequence)
public abstract Pair<IntList,Double> getViterbiPathFor(int startPos, int endPos, Sequence seq) throws Exception
startPos
- the start position within the sequenceendPos
- the end position within the sequenceseq
- the sequence
Pair
containing the viterbi state path and the corresponding score
Exception
- if the viterbi path could not be computed, for instance if the model is not trained, ...public Pair<IntList,Double> getViterbiPathFor(Sequence seq) throws Exception
seq
- the sequence
Pair
containing the viterbi state path and the corresponding score
Exception
- if the viterbi path could not be computed, for instance if the model is not trained, ...getViterbiPathFor(int, int, Sequence)
public Pair<IntList,Double>[] getViterbiPathsFor(DataSet data) throws Exception
data
.
data
- the sequences
Exception
- if the viterbi paths and scores could not be computed, for instance if the model is not trained, ...getViterbiPathFor(Sequence)
public final String[] decodePath(IntList path)
path
- the path in integer representation
getViterbiPathFor(Sequence)
,
getViterbiPathFor(int, int, Sequence)
public abstract double getLogProbForPath(IntList path, int startPos, Sequence seq) throws Exception
path
- the given state pathstartPos
- the start position within the sequence(s) (inclusive)seq
- the sequence(s)
Exception
- if the probability for the sequence given path could not be computed, for instance if the model is not trained, ...protected abstract void createHelperVariables()
protected void provideMatrix(int type, int length)
createHelperVariables()
and provides the matrix with given type. Type 0 stands for fwdMatrix
, and type 1 stands for bwdMatrix
.
type
- the type of the matrixlength
- the maximal sequence lengthpublic int getNumberOfStates()
public double getLogProbFor(Sequence sequence, int startpos, int endpos) throws Exception
StatisticalModel
StatisticalModel.getLogProbFor(Sequence, int)
by the fact, that the model could be
e.g. homogeneous and therefore the length of the sequences, whose
probability should be returned, is not fixed. Additionally, the end
position of the part of the given sequence is given and the probability
of the part from position startpos
to endpos
(inclusive) should be returned.
length
and the alphabets
define the type of
data that can be modeled and therefore both has to be checked.
getLogProbFor
in interface StatisticalModel
sequence
- the given sequencestartpos
- the start position within the given sequenceendpos
- the last position to be taken into account
Exception
- if the sequence could not be handled (e.g.
startpos >
, endpos
> sequence.length
, ...) by the model
NotTrainedException
- if the model is not trained yetprotected static RuntimeException getRunTimeException(Exception e)
RuntimeException
from any other Exception
e
- the Exception
RuntimeException
protected double logProb(int startpos, int endpos, Sequence sequence) throws Exception
AlphabetContainer
and possible further features
before starting the computation.
startpos
- the start position (inclusive)endpos
- the end position (inclusive)sequence
- the Sequence
(s)
Exception
- if the model has no parameters (for instance if it is not trained)public void train(DataSet data) throws Exception
TrainableStatisticalModel
TrainableStatisticalModel
object given the data as DataSet
. train(data1)
; train(data2)
should be a fully trained model over data2
and not over
data1+data2
. All parameters of the model were given by the
call of the constructor.
train
in interface TrainableStatisticalModel
train
in class AbstractTrainableStatisticalModel
data
- the given sequences as DataSet
Exception
- if the training did not succeedDataSet.getElementAt(int)
,
DataSet.ElementEnumerator
public final void setOutputStream(OutputStream o)
OutputStream
that is used e.g. for writing information
while training. It is possible to set o=null
, than nothing
will be written.
o
- the OutputStream
protected void finalize() throws Throwable
finalize
in class Object
Throwable
protected void determineFinalStates()
finalState
public static int[][] decodeStatePosterior(double[][]... statePosterior)
statePosterior
- the (log) state posterior(s)
getLogStatePosteriorMatrixFor(int, int, Sequence)
public String toString()
TrainableStatisticalModel
String
.
toString
in interface TrainableStatisticalModel
toString
in class Object
String
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |