|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture
de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.ZOOPSTrainSM
public class ZOOPSTrainSM
This class enables the user to search for a single motif in a sequence. The
user is enabled to train the model either "one occurrence per
sequence" (=OOPS) or "zero or one occurrence per sequence"
(=ZOOPS).
If EM is used for training the parameters are trained in a MEME-like manner.
Currently only EM is implemented.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
---|
AbstractMixtureTrainSM.Algorithm, AbstractMixtureTrainSM.Parameterization |
Nested classes/interfaces inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer |
---|
MotifDiscoverer.KindOfProfile |
Field Summary | |
---|---|
protected byte |
bgMaxMarkovOrder
The order of the background model. |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture |
---|
posPrior |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
---|
algorithm, algorithmHasBeenRun, alternativeModel, best, burnInTest, componentHyperParams, compProb, counter, dimension, estimateComponentProbs, file, filereader, filewriter, initialIteration, logWeights, model, optimizeModel, sample, samplingIndex, seqWeights, sostream, starts, stationaryIteration, weights |
Fields inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
alphabets, length |
Constructor Summary | |
---|---|
|
ZOOPSTrainSM(StringBuffer xml)
The standard constructor for the interface Storable . |
protected |
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
double[] weights,
PositionPrior posPrior,
AbstractMixtureTrainSM.Algorithm algorithm,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization,
int initialIteration,
int stationaryIteration,
BurnInTest burnInTest)
Creates a new ZOOPSTrainSM . |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double[] componentHyperParams,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new ZOOPSTrainSM using EM and estimating
the probability for finding a motif. |
|
ZOOPSTrainSM(TrainableStatisticalModel motif,
TrainableStatisticalModel bg,
boolean trainOnlyMotifModel,
int starts,
double motifProb,
PositionPrior posPrior,
double alpha,
TerminationCondition tc,
AbstractMixtureTrainSM.Parameterization parametrization)
Creates a new ZOOPSTrainSM using EM and fixed
probability for finding a motif. |
Method Summary | |
---|---|
protected double[][] |
createSeqWeightsArray()
Creates an array that can be used for weighting sequences in the algorithm. |
protected double[][] |
doFirstIteration(double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method will do the first step in the train algorithm for the current model on the internal sample. |
int |
getGlobalIndexOfMotifInComponent(int component,
int motif)
Returns the global index of the motif used in
component . |
protected double |
getLogProbUsingCurrentParameterSetFor(int component,
Sequence seq,
int start,
int end)
Returns the logarithmic probability for the sequence and the given component using the current parameter set. |
int |
getMinimalSequenceLength()
Returns the minimal length a sequence respectively a sample has to have. |
int |
getMotifLength(int motif)
This method returns the length of the motif with index motif
. |
protected double |
getNewWeights(double[] dataWeights,
double[] w,
double[][] seqweights)
Computes sequence weights and returns the score. |
int |
getNumberOfMotifs()
Returns the number of motifs for this MotifDiscoverer . |
int |
getNumberOfMotifsInComponent(int component)
Returns the number of motifs that are used in the component component of this MotifDiscoverer . |
double[] |
getProfileOfScoresFor(int component,
int motif,
Sequence sequence,
int startpos,
MotifDiscoverer.KindOfProfile kind)
Returns the profile of the scores for component component
and motif motif at all possible start positions of the motif
in the sequence sequence beginning at startpos . |
double[] |
getStrandProbabilitiesFor(int component,
int motif,
Sequence sequence,
int startpos)
This method returns the probabilities of the strand orientations for a given subsequence if it is considered as site of the motif model in a specific component. |
protected double |
iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
This method runs the train algorithm for the current model and the internal data set. |
protected double |
modify(double[] containsMotif,
double[] startpos,
int start,
int end)
This method modifies the computed weights for one sequence and returns the score. |
void |
setShiftCorrection(boolean correct)
Enables or disables the phase shift correction. |
protected void |
setTrainData(DataSet data)
This method is invoked by the train -method and sets for a
given sample the sample that should be used for train . |
void |
trainBgModel(DataSet data,
double[] weights)
This method trains the background model. |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.HiddenMotifMixture |
---|
checkLength, clone, emitDataSetUsingCurrentParameterSet, extractFurtherInformation, getFurtherInformation, getInstanceName, getNewParameters, toString, train |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM |
---|
algorithmHasBeenRun, checkModelsForGibbsSampling, continueIterations, continueIterations, doFirstIteration, doFirstIteration, draw, emitDataSet, extendSampling, finalize, fromXML, getCharacteristics, getIndexOfMaximalComponentFor, getLogPriorTerm, getLogPriorTermForComponentProbs, getLogProbFor, getLogProbFor, getLogScoreFor, getModel, getModels, getMRG, getMRGParams, getNameOfAlgorithm, getNewComponentProbs, getNewParametersForModel, getNumberOfComponents, getNumericalCharacteristics, getScoreForBestRun, getWeights, initModelForSampling, initWithPrior, isInitialized, isInSamplingMode, iterate, max, modifyWeights, parseNextParameterSet, parseParameterSet, samplingStopped, setAlpha, setOutputStream, setWeights, swap, toXML |
Methods inherited from class de.jstacs.sequenceScores.statisticalModels.trainable.AbstractTrainableStatisticalModel |
---|
check, getAlphabetContainer, getLength, getLogProbFor, getLogProbFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getLogScoreFor, getMaximalMarkovOrder, train |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface de.jstacs.motifDiscovery.MotifDiscoverer |
---|
getIndexOfMaximalComponentFor, getNumberOfComponents |
Methods inherited from interface de.jstacs.Storable |
---|
toXML |
Field Detail |
---|
protected byte bgMaxMarkovOrder
Constructor Detail |
---|
protected ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double[] componentHyperParams, double[] weights, PositionPrior posPrior, AbstractMixtureTrainSM.Algorithm algorithm, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization, int initialIteration, int stationaryIteration, BurnInTest burnInTest) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM
. This constructor can be
used for any algorithm since it takes all necessary values as parameters.
motif
- the motif model, if the model is trained using
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the
model has to implement
SamplingComponent
.bg
- the background model for the flanking sequences and for those
sequences that do not contain a binding site, if
trainOnlyMotifModel == false
and
algorithm == AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
the model has to implement
SamplingComponent
.
The model has to be able to score sequences of arbitrary
length.trainOnlyMotifModel
- a switch whether to train only the motif modelstarts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length dimension
null
or an array with all values zero (0)
than ML
parameterization
weights
- null
or the weights for the components (then
weights.length == dimension
)posPrior
- this object determines the positional distribution that shall
be usedalgorithm
- either AbstractMixtureTrainSM.Algorithm.EM
or
AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
alpha
- only for AbstractMixtureTrainSM.Algorithm.EM
train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- only for AbstractMixtureTrainSM.Algorithm.EM
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
initialIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
stationaryIteration/starts
)stationaryIteration
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
burnInTest
- only for AbstractMixtureTrainSM.Algorithm.GIBBS_SAMPLING
CloneNotSupportedException
- if
weights != null && weights.length != 2
weights != null
and it exists an
i
where weights[i] < 0
starts < 1
componentHyperParams
are not correct
IllegalArgumentException
- if not all models
work on the same simple
alphabet
WrongAlphabetException
- if the models
can not be clonedpublic ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double[] componentHyperParams, PositionPrior posPrior, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM
using EM and estimating
the probability for finding a motif.
motif
- the motif modelbg
- the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts
- the number of times the algorithm will be started in the
train
-method, at least 1componentHyperParams
- the hyperparameters for the component assignment prior
estimateComponentProbs == true
null
or has to have
length dimension
null
or an array with all values zero (0)
then ML
parameterization
posPrior
- this object determines the positional distribution that shall
be usedtrainOnlyMotifModel
- a switch whether to train only the motif modelalpha
- the positive parameter for the Dirichlet distribution which is
used when you invoke train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
starts < 1
componentHyperParams
are not correct
WrongAlphabetException
- if not all models
work on the same simple
alphabet
CloneNotSupportedException
- if the models
can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.EM
public ZOOPSTrainSM(TrainableStatisticalModel motif, TrainableStatisticalModel bg, boolean trainOnlyMotifModel, int starts, double motifProb, PositionPrior posPrior, double alpha, TerminationCondition tc, AbstractMixtureTrainSM.Parameterization parametrization) throws CloneNotSupportedException, IllegalArgumentException, WrongAlphabetException
ZOOPSTrainSM
using EM and fixed
probability for finding a motif.
motif
- the motif modelbg
- the background model for the flanking sequences and for those
sequences that do not contain a binding site. The model has to
be able to score sequences of arbitrary length.starts
- the number of times the algorithm will be started in the
train
-method, at least 1motifProb
- the probability of finding a motif in a sequence (in [0,1])posPrior
- this object determines the positional distribution that shall
be usedtrainOnlyMotifModel
- a switch whether to train only the motif modelalpha
- the positive parameter for the Dirichlet distribution which is
used when you invoke train
to initialize the
gammas. It is recommended to use alpha = 1
(uniform distribution on a simplex).tc
- only for AbstractMixtureTrainSM.Algorithm.EM
TerminationCondition
for stopping the EM-algorithm,
tc
has to return true
from TerminationCondition.isSimple()
parametrization
- the type of the component probability parameterization
AbstractMixtureTrainSM.Parameterization.THETA
or
AbstractMixtureTrainSM.Parameterization.LAMBDA
AbstractMixtureTrainSM.Parameterization.LAMBDA
IllegalArgumentException
- if
motifProb < 0
or
motifProb > 1
starts < 1
WrongAlphabetException
- if not all models
work on the same simple
alphabet
CloneNotSupportedException
- if the models
can not be clonedZOOPSTrainSM(de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, de.jstacs.sequenceScores.statisticalModels.trainable.TrainableStatisticalModel, boolean, int, double[], double[], de.jstacs.sequenceScores.statisticalModels.trainable.mixture.motif.positionprior.PositionPrior, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Algorithm, double, de.jstacs.algorithms.optimization.termination.TerminationCondition, de.jstacs.sequenceScores.statisticalModels.trainable.mixture.AbstractMixtureTrainSM.Parameterization, int, int, de.jstacs.sampling.BurnInTest)
,
AbstractMixtureTrainSM.Algorithm.EM
public ZOOPSTrainSM(StringBuffer xml) throws NonParsableException
Storable
.
Creates a new ZOOPSTrainSM
out of its XML
representation.
xml
- the XML representation of the model as a StringBuffer
NonParsableException
- if the StringBuffer
can not be parsedMethod Detail |
---|
protected void setTrainData(DataSet data) throws Exception
AbstractMixtureTrainSM
train
-method and sets for a
given sample the sample that should be used for train
.
setTrainData
in class AbstractMixtureTrainSM
data
- the given sample of sequences
Exception
- if something went wrongprotected double[][] createSeqWeightsArray()
AbstractMixtureTrainSM
createSeqWeightsArray
in class AbstractMixtureTrainSM
protected double[][] doFirstIteration(double[] dataWeights, MultivariateRandomGenerator m, MRGParams[] params) throws Exception
AbstractMixtureTrainSM
doFirstIteration
in class AbstractMixtureTrainSM
dataWeights
- null
or the weights of each element of the samplem
- the multivariate random generatorparams
- the parameters for the multivariate random generator
Exception
- if something went wrongprotected double getNewWeights(double[] dataWeights, double[] w, double[][] seqweights) throws Exception
AbstractMixtureTrainSM
getNewWeights
in class AbstractMixtureTrainSM
dataWeights
- the weights for the internal sample (should not be changed)w
- the array for the statistic of the component parameters (shall
be filled)seqweights
- an array containing for each component the weights for each
sequence (shall be filled)
Exception
- if something went wrongprotected double modify(double[] containsMotif, double[] startpos, int start, int end)
containsMotif
- an array to return the weights for containing a motif (index
0) or containing no motif (index 1)startpos
- the array containing the scores for each start position
(including no motif in the sequence)start
- the start indexend
- the end index
protected double getLogProbUsingCurrentParameterSetFor(int component, Sequence seq, int start, int end) throws Exception
AbstractMixtureTrainSM
getLogProbUsingCurrentParameterSetFor
in class AbstractMixtureTrainSM
component
- the index of the componentseq
- the sequencestart
- the start position in the sequenceend
- the end position in the sequence
log P(s,component) = log P(s|component) + log P(component)
Exception
- if not trained yet or something else went wrongAbstractMixtureTrainSM.getNumberOfComponents()
public double[] getProfileOfScoresFor(int component, int motif, Sequence sequence, int startpos, MotifDiscoverer.KindOfProfile kind) throws Exception
MotifDiscoverer
component
and motif motif
at all possible start positions of the motif
in the sequence sequence
beginning at startpos
.
This array should be of length sequence.length() - startpos - motifs[motif].length() + 1
.
component
- the component indexmotif
- the index of the motif in the componentsequence
- the given sequencestartpos
- the start position in the sequencekind
- indicates the kind of profile
Exception
- if the score could not be computed for any reasonspublic int getMinimalSequenceLength()
HiddenMotifMixture
getMinimalSequenceLength
in class HiddenMotifMixture
public int getMotifLength(int motif)
MotifDiscoverer
motif
.
motif
- the index of the motif
motif
public int getNumberOfMotifs()
MotifDiscoverer
MotifDiscoverer
.
public int getNumberOfMotifsInComponent(int component)
MotifDiscoverer
component
of this MotifDiscoverer
.
component
- the component of the MotifDiscoverer
public double[] getStrandProbabilitiesFor(int component, int motif, Sequence sequence, int startpos) throws Exception
MotifDiscoverer
component
- the component indexmotif
- the index of the motif in the componentsequence
- the given sequencestartpos
- the start position in the sequence
Exception
- if the strand could not be computed for any reasonspublic int getGlobalIndexOfMotifInComponent(int component, int motif)
MotifDiscoverer
motif
used in
component
. The index returned must be at least 0 and less
than MotifDiscoverer.getNumberOfMotifs()
.
component
- the component indexmotif
- the motif index in the component
motif in component
trainBgModel
public void trainBgModel(DataSet data,
double[] weights)
throws Exception
- Description copied from class:
HiddenMotifMixture
- This method trains the background model. This can be useful if the
background model is not trained during the EM-algorithm.
- Specified by:
trainBgModel
in class HiddenMotifMixture
- Parameters:
data
- the sampleweights
- the weights
- Throws:
Exception
- if something went wrong
iterate
protected double iterate(int start,
double[] dataWeights,
MultivariateRandomGenerator m,
MRGParams[] params)
throws Exception
- Description copied from class:
AbstractMixtureTrainSM
- This method runs the train algorithm for the current model and the
internal data set.
- Overrides:
iterate
in class AbstractMixtureTrainSM
- Parameters:
start
- the index of the trainingdataWeights
- the weights for each sequence or null
m
- the random generator for initiating the algorithmparams
- the parameters for the sequences
- Returns:
- the score
- Throws:
Exception
- if something went wrong- See Also:
AbstractMixtureTrainSM.doFirstIteration(DataSet, double[],
MultivariateRandomGenerator, MRGParams[])
,
AbstractMixtureTrainSM.continueIterations(double[], double[][])
,
AbstractMixtureTrainSM.continueIterations(double[], double[][], int,
int)
setShiftCorrection
public void setShiftCorrection(boolean correct)
- Enables or disables the phase shift correction. By default, shift correction is enabled.
- Parameters:
correct
- switch that determines whether to correct shifts or not
Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV CLASS
NEXT CLASS
FRAMES
NO FRAMES
All Classes
SUMMARY: NESTED | FIELD | CONSTR | METHOD
DETAIL: FIELD | CONSTR | METHOD