public class HomogeneousMMDiffSM extends HomogeneousDiffSM
alphabets, length, r
UNKNOWN
Constructor and Description |
---|
HomogeneousMMDiffSM(AlphabetContainer alphabets,
int order,
double classEss,
double[][] hyperParams,
boolean plugIn,
boolean optimize,
int starts)
This is the main constructor that creates an instance of a homogeneous
Markov model of arbitrary order with given hyper-parameters for the prior.
|
HomogeneousMMDiffSM(AlphabetContainer alphabets,
int order,
double classEss,
double[] sumOfHyperParams,
boolean plugIn,
boolean optimize,
int starts)
This is the main constructor that creates an instance of a homogeneous
Markov model of arbitrary order.
|
HomogeneousMMDiffSM(AlphabetContainer alphabets,
int order,
double classEss,
int length)
This is a convenience constructor for creating an instance of a homogeneous
Markov model of arbitrary order.
|
HomogeneousMMDiffSM(StringBuffer xml)
This is the constructor for
Storable . |
Modifier and Type | Method and Description |
---|---|
void |
addGradientOfLogPriorTerm(double[] grad,
int start)
This method computes the gradient of
DifferentiableStatisticalModel.getLogPriorTerm() for each
parameter of this model. |
HomogeneousMMDiffSM |
clone()
Creates a clone (deep copy) of the current
DifferentiableSequenceScore
instance. |
DataSet |
emitDataSet(int numberOfSequences,
int... seqLength)
This method returns a
DataSet object containing artificial
sequence(s). |
protected void |
fromXML(StringBuffer xml)
This method is called in the constructor for the
Storable
interface to create a scoring function from a StringBuffer . |
double[][][] |
getAllConditionalStationaryDistributions()
This method returns the stationary conditional distributions.
|
double[] |
getCurrentParameterValues()
Returns a
double array of dimension
DifferentiableSequenceScore.getNumberOfParameters() containing the current parameter values. |
double |
getESS()
Returns the equivalent sample size (ess) of this model, i.e.
|
String |
getInstanceName()
Should return a short instance name such as iMM(0), BN(2), ...
|
double |
getLogNormalizationConstant(int length)
This method returns the logarithm of the normalization constant for a given sequence
length.
|
double |
getLogPartialNormalizationConstant(int parameterIndex,
int length)
This method returns the logarithm of the partial normalization constant for a given
parameter index and a sequence length.
|
double |
getLogPriorTerm()
This method computes a value that is proportional to
|
double |
getLogScoreAndPartialDerivation(Sequence seq,
int start,
int end,
IntList indices,
DoubleList dList)
|
double |
getLogScoreFor(Sequence seq,
int start,
int end)
|
byte |
getMaximalMarkovOrder()
Returns the maximal used markov oder.
|
int |
getNumberOfParameters()
Returns the number of parameters in this
DifferentiableSequenceScore . |
int |
getNumberOfRecommendedStarts()
This method returns the number of recommended optimization starts.
|
int[][] |
getSamplingGroups(int parameterOffset)
Returns groups of indexes of parameters that shall be drawn
together in a sampling procedure
|
int |
getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
Returns the size of the event space of the random variables that are
affected by parameter no.
|
static double[] |
getSumOfHyperParameters(int order,
int length,
double ess)
This method returns an array that can be used in the constructor
HomogeneousMMDiffSM(AlphabetContainer, int, double, double[], boolean, boolean, int)
containing the sums of the specific hyper-parameters. |
void |
initializeFunction(int index,
boolean freeParams,
DataSet[] data,
double[][] weights)
This method creates the underlying structure of the
DifferentiableSequenceScore . |
void |
initializeFunctionRandomly(boolean freeParams)
This method initializes the
DifferentiableSequenceScore randomly. |
void |
initializeUniformly(boolean freeParams)
This method allows to initialize the instance with an uniform distribution.
|
boolean |
isInitialized()
This method can be used to determine whether the instance is initialized.
|
boolean |
isNormalized()
This method indicates whether the implemented score is already normalized
to 1 or not.
|
void |
setParameterOptimization(boolean optimize)
This method allows the user to specify whether the parameters should be
optimized or not.
|
void |
setParameters(double[] params,
int start)
This method sets the internal parameters to the values of
params between start and
start + |
void |
setStartParamsToConditionalStationaryDistributions()
Sets the start parameters of this homogeneous Markov model to
the corresponding stationary distributions of the transition probabilities.
|
void |
setStatisticForHyperparameters(int[] length,
double[] weight)
This method sets the hyperparameters for the model parameters by
evaluating the given statistic.
|
String |
toString(NumberFormat nf)
This method returns a
String representation of the instance. |
StringBuffer |
toXML()
This method returns an XML representation as
StringBuffer of an
instance of the implementing class. |
getLogNormalizationConstant, getLogPartialNormalizationConstant, getLogScoreAndPartialDerivation, getLogScoreFor
getInitialClassParam, getLogProbFor, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, isNormalized
getAlphabetContainer, getCharacteristics, getLength, getLogScoreAndPartialDerivation, getLogScoreFor, getNumberOfStarts, getNumericalCharacteristics, toString
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getLogNormalizationConstant, getLogPartialNormalizationConstant
getInitialClassParam, getLogScoreAndPartialDerivation, getLogScoreAndPartialDerivation
getLogProbFor, getLogProbFor, getLogProbFor
getAlphabetContainer, getCharacteristics, getLength, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getNumericalCharacteristics
public HomogeneousMMDiffSM(AlphabetContainer alphabets, int order, double classEss, int length)
alphabets
- the AlphabetContainer
order
- the oder of the model (has to be non-negative)classEss
- the equivalent sample size (ess) of the classlength
- the sequence length (only used for computing the hyper-parameters)getSumOfHyperParameters(int, int, double)
,
HomogeneousMMDiffSM(AlphabetContainer, int, double, double[], boolean, boolean, int)
public HomogeneousMMDiffSM(AlphabetContainer alphabets, int order, double classEss, double[] sumOfHyperParams, boolean plugIn, boolean optimize, int starts)
alphabets
- the AlphabetContainer
order
- the oder of the model (has to be non-negative)classEss
- the equivalent sample size (ess) of the classsumOfHyperParams
- the sum of the hyper-parameters for each order (length has to
be order
, each entry has to be non-negative)plugIn
- a switch which enables to use the MAP-parameters as plug-in
parametersoptimize
- a switch which enables to optimize or fix the parametersstarts
- the number of recommended startspublic HomogeneousMMDiffSM(AlphabetContainer alphabets, int order, double classEss, double[][] hyperParams, boolean plugIn, boolean optimize, int starts)
alphabets
- the AlphabetContainer
order
- the oder of the model (has to be non-negative)classEss
- the equivalent sample size (ess) of the classhyperParams
- the hyper-parameters for each order (length has to
be order
, each entry has to be non-negative)plugIn
- a switch which enables to use the MAP-parameters as plug-in
parametersoptimize
- a switch which enables to optimize or fix the parametersstarts
- the number of recommended startspublic HomogeneousMMDiffSM(StringBuffer xml) throws NonParsableException
Storable
. Creates a new
HomogeneousMMDiffSM
out of its XML representation as returned by
fromXML(StringBuffer)
.xml
- the XML representation as StringBuffer
NonParsableException
- if the StringBuffer
representation
could
not be parsedpublic static double[] getSumOfHyperParameters(int order, int length, double ess)
HomogeneousMMDiffSM(AlphabetContainer, int, double, double[], boolean, boolean, int)
containing the sums of the specific hyper-parameters.order
- the order of the modellength
- the sequence lengthess
- the class ESSHomogeneousMMDiffSM(AlphabetContainer, int, double, double[], boolean, boolean, int)
public HomogeneousMMDiffSM clone() throws CloneNotSupportedException
DifferentiableSequenceScore
DifferentiableSequenceScore
instance.clone
in interface DifferentiableSequenceScore
clone
in interface SequenceScore
clone
in class AbstractDifferentiableStatisticalModel
DifferentiableSequenceScore
CloneNotSupportedException
- if something went wrong while cloning the
DifferentiableSequenceScore
public String getInstanceName()
SequenceScore
public double getLogScoreFor(Sequence seq, int start, int end)
SequenceScore
getLogScoreFor
in interface SequenceScore
getLogScoreFor
in interface VariableLengthDiffSM
getLogScoreFor
in class AbstractVariableLengthDiffSM
seq
- the Sequence
start
- the start position in the Sequence
end
- the end position (inclusive) in the Sequence
Sequence
public double getLogScoreAndPartialDerivation(Sequence seq, int start, int end, IntList indices, DoubleList dList)
DifferentiableSequenceScore
Sequence
beginning at
position start
in the Sequence
and fills lists with
the indices and the partial derivations.getLogScoreAndPartialDerivation
in interface DifferentiableSequenceScore
getLogScoreAndPartialDerivation
in interface VariableLengthDiffSM
getLogScoreAndPartialDerivation
in class AbstractVariableLengthDiffSM
seq
- the Sequence
start
- the start position in the Sequence
end
- the end position (inclusive) in the Sequence
indices
- an IntList
of indices, after method invocation the
list should contain the indices i where
dList
- a DoubleList
of partial derivations, after method
invocation the list should contain the corresponding
Sequence
public int getNumberOfParameters()
DifferentiableSequenceScore
DifferentiableSequenceScore
. If the
number of parameters is not known yet, the method returns
DifferentiableSequenceScore.UNKNOWN
.DifferentiableSequenceScore
DifferentiableSequenceScore.UNKNOWN
public void setParameters(double[] params, int start)
DifferentiableSequenceScore
params
between start
and
start + DifferentiableSequenceScore.getNumberOfParameters()
- 1
params
- the new parametersstart
- the start index in params
public StringBuffer toXML()
Storable
StringBuffer
of an
instance of the implementing class.public double[] getCurrentParameterValues()
DifferentiableSequenceScore
double
array of dimension
DifferentiableSequenceScore.getNumberOfParameters()
containing the current parameter values.
If one likes to use these parameters to start an optimization it is
highly recommended to invoke
DifferentiableSequenceScore.initializeFunction(int, boolean, DataSet[], double[][])
before.
After an optimization this method can be used to get the current
parameter values.public void initializeFunction(int index, boolean freeParams, DataSet[] data, double[][] weights)
DifferentiableSequenceScore
DifferentiableSequenceScore
.index
- the index of the class the DifferentiableSequenceScore
modelsfreeParams
- indicates whether the (reduced) parameterization is useddata
- the data setsweights
- the weights of the sequences in the data setspublic void initializeFunctionRandomly(boolean freeParams)
DifferentiableSequenceScore
DifferentiableSequenceScore
randomly. It has to
create the underlying structure of the DifferentiableSequenceScore
.freeParams
- indicates whether the (reduced) parameterization is usedprotected void fromXML(StringBuffer xml) throws NonParsableException
AbstractDifferentiableSequenceScore
Storable
interface to create a scoring function from a StringBuffer
.fromXML
in class AbstractDifferentiableSequenceScore
xml
- the XML representation as StringBuffer
NonParsableException
- if the StringBuffer
could not be parsedAbstractDifferentiableSequenceScore.AbstractDifferentiableSequenceScore(StringBuffer)
public int getSizeOfEventSpaceForRandomVariablesOfParameter(int index)
DifferentiableStatisticalModel
index
, i.e. the product of the
sizes of the alphabets at the position of each random variable affected
by parameter index
. For DNA alphabets this corresponds to 4
for a PWM, 16 for a WAM except position 0, ...index
- the index of the parameterpublic double getLogNormalizationConstant(int length)
VariableLengthDiffSM
length
- the sequence lengthDifferentiableStatisticalModel.getLogNormalizationConstant()
public double getLogPartialNormalizationConstant(int parameterIndex, int length) throws Exception
VariableLengthDiffSM
parameterIndex
- the index of the parameterlength
- the sequence lengthException
- if something went wrongDifferentiableStatisticalModel.getLogPartialNormalizationConstant(int)
public double getESS()
DifferentiableStatisticalModel
public String toString(NumberFormat nf)
SequenceScore
String
representation of the instance.nf
- the NumberFormat
for the String
representation of parameters or probabilitiesString
representation of the instancepublic double getLogPriorTerm()
DifferentiableStatisticalModel
DifferentiableStatisticalModel.getESS()
* DifferentiableStatisticalModel.getLogNormalizationConstant()
+ Math.log( prior )
prior
is the prior for the parameters of this model.DifferentiableStatisticalModel.getESS()
* DifferentiableStatisticalModel.getLogNormalizationConstant()
+ Math.log( prior ).
DifferentiableStatisticalModel.getESS()
,
DifferentiableStatisticalModel.getLogNormalizationConstant()
public void addGradientOfLogPriorTerm(double[] grad, int start)
DifferentiableStatisticalModel
DifferentiableStatisticalModel.getLogPriorTerm()
for each
parameter of this model. The results are added to the array
grad
beginning at index start
.grad
- the array of gradientsstart
- the start index in the grad
array, where the
partial derivations for the parameters of this models shall be
enteredDifferentiableStatisticalModel.getLogPriorTerm()
public boolean isNormalized()
DifferentiableStatisticalModel
false
.isNormalized
in interface DifferentiableStatisticalModel
isNormalized
in class AbstractDifferentiableStatisticalModel
true
if the implemented score is already normalized
to 1, false
otherwisepublic boolean isInitialized()
SequenceScore
SequenceScore.getLogScoreFor(Sequence)
.true
if the instance is initialized, false
otherwisepublic byte getMaximalMarkovOrder()
HomogeneousDiffSM
getMaximalMarkovOrder
in interface StatisticalModel
getMaximalMarkovOrder
in class HomogeneousDiffSM
public int getNumberOfRecommendedStarts()
DifferentiableSequenceScore
getNumberOfRecommendedStarts
in interface DifferentiableSequenceScore
getNumberOfRecommendedStarts
in class AbstractDifferentiableSequenceScore
public void setParameterOptimization(boolean optimize)
optimize
- indicates if the parameters should be optimized or notpublic double[][][] getAllConditionalStationaryDistributions()
public void setStartParamsToConditionalStationaryDistributions()
public void setStatisticForHyperparameters(int[] length, double[] weight) throws Exception
VariableLengthDiffSM
length
) and how often (
weight
) they have been seen.public DataSet emitDataSet(int numberOfSequences, int... seqLength) throws Exception
DataSet
object containing artificial
sequence(s).
DataSet
:
emitDataSet
in interface StatisticalModel
emitDataSet
in class AbstractDifferentiableStatisticalModel
numberOfSequences
- the number of sequences that should be contained in the
returned DataSet
seqLength
- the length of the sequencesDataSet
containing numberOfSequences
artificial sequence(s)Exception
- if the emission of the artificial DataSet
did not
succeedDataSet
public void initializeUniformly(boolean freeParams)
HomogeneousDiffSM
initializeUniformly
in class HomogeneousDiffSM
freeParams
- a switch whether to take only free parameters or to take allpublic int[][] getSamplingGroups(int parameterOffset)
SamplingDifferentiableStatisticalModel
parameterOffset
- a global offset on the parameter indexesparameterOffset
.