|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.MixtureTrainSM
public class MixtureTrainSM
The class for a mixture model of any TrainableStatisticalModel
s.
If you use Gibbs sampling temporary files will be created in the Java temp
folder. These files will be deleted if no reference to the current instance
exists and the Garbage Collector is called. Therefore it is recommended to
call the Garbage Collector explicitly at the end of any application.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
---|
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization |
Field Summary |
---|
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
---|
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
alphabets, length |
Constructor Summary | |
---|---|
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
double[] weights,
int starts,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and fixed component probabilities. |
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
double[] weights,
int starts,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and fixed component probabilities. |
protected |
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
boolean estimateComponentProbs,
double[] componentHyperParams,
double[] weights,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new MixtureTrainSM . |
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
double[] componentHyperParams,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates an instance using EM and estimating the component probabilities. |
|
MixtureTrainSM(int length,
TrainableStatisticalModel[] models,
int starts,
double[] componentHyperParams,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates an instance using Gibbs Sampling and sampling the component probabilities. |
|
MixtureTrainSM(StringBuffer xml)
The constructor for the interface Storable . |
Method Summary | |
---|---|
double[][] |
doFirstIteration(DataSet data,
double[] dataWeights,
double[][] partitioning)
This method enables you to train a mixture model with a fixed start partitioning. |
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current model on the internal sample. |
protected Sequence[] |
emitDataSetUsingCurrentParameterSet(int n,
int... lengths)
The method returns an array of sequences using the current parameter set. |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence s,
int start,
int end)
Returns the logarithmic probability for the sequence and the given component using the current parameter set. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score. |
protected void |
setTrainData(DataSet data)
This method is invoked by the train -method and sets for a
given sample the sample that should be used for train . |
String |
toString()
Should give a simple representation (text) of the model as String . |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, train |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
protected MixtureTrainSM(int length, TrainableStatisticalModel[] models, int starts, boolean estimateComponentProbs, double[] componentHyperParams, double[] weights, AbstractMixtureTrainSM.Algorithm algorithm, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws IllegalArgumentException, WrongAlphabetException, CloneNotSupportedException
MixtureTrainSM
. This constructor can be used for any
algorithm since it takes all necessary values as parameters.
length
- the length used in this modelmodels
- the single models building the MixtureTrainSM
, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
models that will be adjusted have to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1estimateComponentProbs
- the switch for estimating the component probabilities in the
algorithm or to hold them fixed; if the component parameters
are fixed, the values of weights
will be used,
otherwise the componentHyperParams
will be
incorporated in the adjustmentcomponentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length models.length
null
or an array with all values zero (0)
then ML
parameterization
weights
- null
or the weights for the components (then
weights.length == models.length
)algorithm
- either AbstractMixtureTrainSM.Algorithm.EM
or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an i
where weights[i] < 0
starts
< 1
componentHyperParams
are not
correct WrongAlphabetException
- if not all models
work on the same alphabet
CloneNotSupportedException
- if the models
can not be clonedpublic MixtureTrainSM(int length, TrainableStatisticalModel[] models, int starts, double[] componentHyperParams, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws IllegalArgumentException, WrongAlphabetException, CloneNotSupportedException
length
- the length used in this modelmodels
- the single models building the MixtureTrainSM
, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
models that will be adjusted have to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length models.length
null
or an array with all values zero (0)
then ML
parameterization
alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabet
CloneNotSupportedException
- if the models
can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.EM
public MixtureTrainSM(int length, TrainableStatisticalModel[] models, double[] weights, int starts, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws IllegalArgumentException, WrongAlphabetException, CloneNotSupportedException
length
- the length used in this modelmodels
- the single models building the MixtureTrainSM
, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
models that will be adjusted have to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1weights
- null
or the weights for the components (then
weights.length == models.length
)alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabet
CloneNotSupportedException
- if the models
can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.EM
public MixtureTrainSM(int length, TrainableStatisticalModel[] models, int starts, double[] componentHyperParams, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws IllegalArgumentException, WrongAlphabetException, CloneNotSupportedException
length
- the length used in this modelmodels
- the single models building the MixtureTrainSM
, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
models that will be adjusted have to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length models.length
null
or an array with all values zero (0)
then ML
parameterization
initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabet
CloneNotSupportedException
- if the models
can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
public MixtureTrainSM(int length, TrainableStatisticalModel[] models, double[] weights, int starts, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws IllegalArgumentException, WrongAlphabetException, CloneNotSupportedException
length
- the length used in this modelmodels
- the single models building the MixtureTrainSM
, if the
model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
models that will be adjusted have to implement
SamplingComponent
starts
- the number of times the algorithm will be started in the
train
-method, at least 1weights
- null
or the weights for the components (than
weights.length == models.length
)initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
IllegalArgumentException
- if
length
dimension < 1
weights != null && weights.length != dimension
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same alphabet
CloneNotSupportedException
- if the models
can not be clonedMixtureTrainSM(int, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel[], int,
boolean, double[], double[],
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double,
TerminationCondition,
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int,
int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
public MixtureTrainSM(StringBuffer xml) throws NonParsableException
Storable
. Creates a
new MixtureTrainSM
out of its XML representation.
xml
- the XML representation of the model as StringBuffer
NonParsableException
- if the StringBuffer
is not parsableMethod Detail |
---|
protected Sequence[] emitDataSetUsingCurrentParameterSet(int n, int... lengths) throws Exception
AbstractMixtureTrainSM
emitDataSetUsingCurrentParameterSet
in class AbstractMixtureTrainSM
n
- the number of sequences to be sampledlengths
- the corresponding lengths
Exception
- if it was impossible to sample the sequencesStatisticalModel.emitDataSet(int, int...)
protected double[][] doFirstIteration(double[] dataWeights, MultivariateRandomGenerator m, MRGParams[] params) throws Exception
AbstractMixtureTrainSM
doFirstIteration
in class AbstractMixtureTrainSM
dataWeights
- null
or the weights of each element of the samplem
- the multivariate random generatorparams
- the parameters for the multivariate random generator
Exception
- if something went wrongpublic double[][] doFirstIteration(DataSet data, double[] dataWeights, double[][] partitioning) throws Exception
data
- the sample of sequencesdataWeights
- null
or the weights of each element of the samplepartitioning
- a kind of partitioning
partitioning.length
has to be
data.getNumberofElements()
partitioning[i].length
has to be
getNumberOfModels()
Exception
- if something went wrong or if the number of components is 1protected double getLogProbUsingCurrentParameterSetFor(int component, Sequence s, int start, int end) throws Exception
AbstractMixtureTrainSM
getLogProbUsingCurrentParameterSetFor
in class AbstractMixtureTrainSM
component
- the index of the components
- the sequencestart
- the start position in the sequenceend
- the end position in the sequence
log P(s,component) = log P(s|component) + log P(component)
Exception
- if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()
public String toString()
TrainableStatisticalModel
String
.
toString
in interface TrainableStatisticalModel
toString
in class Object
String
protected double getNewWeights(double[] dataWeights, double[] w, double[][] seqweights) throws Exception
getNewWeights
in class AbstractMixtureTrainSM
dataWeights
- the weights for the internal sample (should not be changed)w
- the array for the statistic of the component parameters (shall
be filled)seqweights
- an array containing for each component the weights for each
sequence (shall be filled)
Exception
- if something went wrongprotected void setTrainData(DataSet data)
AbstractMixtureTrainSM
train
-method and sets for a
given sample the sample that should be used for train
.
setTrainData
in class AbstractMixtureTrainSM
data
- the given sample of sequences
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |