public abstract class SamplingScoreBasedClassifier extends AbstractScoreBasedClassifier
SamplingDifferentiableStatisticalModel
s by the Metropolis-Hastings algorithm.
The distribution the parameters are sampled from is the distribution DiffSSBasedOptimizableFunction
returned by
getFunction(DataSet[], double[][])
. As proposal distribution, a Gaussian distribution with given sampling
variance is used for each parameter.
Specifically, a new set of parameters DifferentiableStatisticalModel.getSizeOfEventSpaceForRandomVariablesOfParameter(int)
. Let SamplingDifferentiableStatisticalModel
that Random.nextDouble()
.
Otherwise, the parameters are rejected and SamplingComponent
. The contents of these files
are stored together with the remaining representation of the SamplingScoreBasedClassifier
, if AbstractClassifier.toXML()
is called, and, hence,
can be stored to a monolithic file containing all information for, e.g., later classification procedures.
For determining the length of the burn-in phase and, as a consequence, the beginning of the stationary phase, a BurnInTest
can be provided to the constructor of the classifier.Modifier and Type | Class and Description |
---|---|
protected class |
SamplingScoreBasedClassifier.DiffSMSamplingComponent
The
SamplingComponent that handles storing and loading sampled parameters values
to and from files. |
static class |
SamplingScoreBasedClassifier.SamplingScheme
Sampling scheme for sampling the parameters of the scoring functions.
|
AbstractScoreBasedClassifier.DoubleTableResult
Modifier and Type | Field and Description |
---|---|
protected Integer |
burnInLength
The length of the burn-in phase as determined by
burnInTest |
protected BurnInTest |
burnInTest
The
BurnInTest , may be null for no test |
protected double[] |
currentParameters
the currently accepted parameters
|
protected double |
currentScore
The score achieved using
currentParameters |
protected double[] |
initParameters
The initial parameters if set by
setInitParameters(double[]) , null otherwise |
protected double[][] |
lastParameters
The last accepted parameters for all samplings, backup for iterative
sampling when checking for
BurnInTest |
protected double[] |
lastScore
The scores yielded for the parameters in
lastParameters |
protected SamplingScoreBasedClassifierParameterSet |
params
Parameters
|
protected double[] |
previousParameters
The previously accepted parameters, backup for rollbacks
|
protected SamplingDifferentiableStatisticalModel[] |
scoringFunctions
|
Modifier | Constructor and Description |
---|---|
protected |
SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params,
BurnInTest burnInTest,
double[] classVariances,
SamplingDifferentiableStatisticalModel... scoringFunctions)
Creates a new
SamplingScoreBasedClassifier using the parameters in params ,
a specified BurnInTest (or null for no burn-in test), a set of sampling variances,
which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution),
and set set of SamplingDifferentiableStatisticalModel s for each of the classes. |
|
SamplingScoreBasedClassifier(StringBuffer xml)
This is the constructor for
Storable . |
Modifier and Type | Method and Description |
---|---|
protected double |
doOneSamplingStep(Function function,
SamplingScoreBasedClassifier.SamplingScheme scheme,
double previousValue)
Performs one sampling step, i.e., one sampling of all parameter values.
|
void |
doSingleSampling(DataSet[] s,
double[][] weights,
int numSteps,
String outfilePrefix)
Does a single sampling run for a predefined number of steps.
|
protected void |
extractFurtherClassifierInfosFromXML(StringBuffer xml)
Extracts further information of a classifier from an XML representation.
|
double[] |
getBestParameters()
Returns the sampled parameter values with the maximum value of the objective function
|
CategoricalResult[] |
getClassifierAnnotation()
Returns an array of Result s of dimension
AbstractClassifier.getNumberOfClasses() that contains information about the
classifier and for each class.
res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
|
boolean |
getDeleteOnExit()
Returns
true if the temporary parameter files shall
be deleted on exit of the program. |
protected abstract Function |
getFunction(DataSet[] data,
double[][] weights)
Returns the function that should be sampled from.
|
protected StringBuffer |
getFurtherClassifierInfos()
This method returns further information of a classifier as a
StringBuffer . |
String |
getInstanceName()
Returns a short description of the classifier.
|
protected double[] |
getMeanParameters(boolean testBurnIn,
int minBurnInSteps)
Returns the mean parameters over all samplings of all stationary phases.
|
protected int |
getNumberOfParameters()
Returns the number of parameters of all internal
SamplingDifferentiableStatisticalModel s. |
NumericalResultSet |
getNumericalCharacteristics()
Returns the subset of numerical values that are also returned by
AbstractClassifier.getCharacteristics() . |
protected SamplingScoreBasedClassifier.DiffSMSamplingComponent |
getSamplingComponent()
Returns a sampling component suited for this
SamplingScoreBasedClassifier |
protected double |
getScore(Sequence seq,
int cls,
boolean check)
This method returns the score for a given
Sequence and a given
class. |
double[] |
getScores(DataSet s)
|
File |
getTempDir()
Returns the directory for parameter files set in this
SamplingScoreBasedClassifier . |
protected void |
init(int starts,
boolean adaptVariance,
String outfilePrefix)
Initializes all internal fields and initializes the
scoringFunctions s randomly |
boolean |
isInitialized()
This method gives information about the state of the classifier.
|
void |
joinAndSetParameterFiles(boolean add,
File... files)
Combines parameter files such that they are accepted as parameter files
of this
SamplingScoreBasedClassifier |
protected double |
modifyFunctionValue(double value)
Allows for a modification of the value returned by the function
obtained by
getFunction(DataSet[], double[][]) . |
protected void |
precomputeBurnInLength(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc)
Precomputes the length of the burn-in phase, e.g.
|
protected void |
sample(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc,
Function function)
Samples as many steps as needed to get into the stationary phase according to
burnInTest and then samples the number of
stationary steps as set in params . |
protected double |
sampleNSteps(Function function,
SamplingScoreBasedClassifier.DiffSMSamplingComponent component,
BurnInTest test,
int numSteps,
SamplingScoreBasedClassifier.SamplingScheme scheme)
Samples a predefined number of steps appended to the current sampling
|
void |
setDeleteOnExit(boolean deleteOnExit)
If set to
true (which is the default), the temporary files for storing sampled parameter
values are deleted on exit of the program. |
void |
setInitParameters(double[] parameters)
Sets the initial parameters of the sampling to
parameters . |
protected void |
setParameters(double[] currentParameters)
Sets the current parameters for the class weights and in all scoring functions
|
void |
setTempDir(File tempDir)
Sets the directory for parameter files set in this
SamplingScoreBasedClassifier . |
String |
toString() |
void |
train(DataSet[] s,
double[][] weights)
This method trains a classifier over an array of weighted
DataSet
s. |
check, check, classify, classify, clone, createDefaultClassWeights, getClassWeight, getClassWeights, getMultiClassScores, getNumberOfClasses, getPValue, getPValue, getResults, getScore, setClassWeights, setClassWeights, setThresholdClassWeights
classify, evaluate, evaluate, getAlphabetContainer, getCharacteristics, getLength, getXMLTag, toXML, train
protected SamplingScoreBasedClassifierParameterSet params
protected SamplingDifferentiableStatisticalModel[] scoringFunctions
protected double[] currentParameters
protected double[] initParameters
setInitParameters(double[])
, null
otherwiseprotected double currentScore
currentParameters
protected double[] previousParameters
protected double[][] lastParameters
BurnInTest
protected double[] lastScore
lastParameters
protected BurnInTest burnInTest
BurnInTest
, may be null for no testprotected Integer burnInLength
burnInTest
public SamplingScoreBasedClassifier(StringBuffer xml) throws NonParsableException
Storable
.xml
- the xml representationNonParsableException
- if the representation could not be parsed.protected SamplingScoreBasedClassifier(SamplingScoreBasedClassifierParameterSet params, BurnInTest burnInTest, double[] classVariances, SamplingDifferentiableStatisticalModel... scoringFunctions) throws CloneNotSupportedException
SamplingScoreBasedClassifier
using the parameters in params
,
a specified BurnInTest
(or null
for no burn-in test), a set of sampling variances,
which may be different for each of the classes (in analogy to equivalent sample size for the Dirichlet distribution),
and set set of SamplingDifferentiableStatisticalModel
s for each of the classes.params
- the external parameters of this classifierburnInTest
- the burn-in test (or null
for no burn-in test)classVariances
- the variances used for sampling for the parameters of each classscoringFunctions
- the scoring functions for each of the classesCloneNotSupportedException
- if the scoring functions or the burn-in test could not be clonedVarianceRatioBurnInTest
protected StringBuffer getFurtherClassifierInfos()
AbstractClassifier
StringBuffer
. This method is used by the method AbstractClassifier.toXML()
and should not be made public.getFurtherClassifierInfos
in class AbstractScoreBasedClassifier
StringBuffer
AbstractClassifier.toXML()
protected void extractFurtherClassifierInfosFromXML(StringBuffer xml) throws NonParsableException
AbstractClassifier
AbstractClassifier.fromXML(StringBuffer)
and
should not be made public.extractFurtherClassifierInfosFromXML
in class AbstractScoreBasedClassifier
xml
- the XML representation as StringBuffer
NonParsableException
- if the information could not be parsed out of the XML
representation (the StringBuffer
could not be parsed)AbstractClassifier.fromXML(StringBuffer)
public CategoricalResult[] getClassifierAnnotation()
AbstractClassifier
Result
s of dimension
AbstractClassifier.getNumberOfClasses()
that contains information about the
classifier and for each class.
res[0] = new CategoricalResult( "classifier", "the kind of classifier", getInstanceName() );
res[1] = new CategoricalResult( "class info 0", "some information about the class", "info0" );
res[2] = new CategoricalResult( "class info 1", "some information about the class", "info1" );
...
getClassifierAnnotation
in class AbstractClassifier
Result
s that contains information about the
classifierpublic NumericalResultSet getNumericalCharacteristics() throws Exception
AbstractClassifier
AbstractClassifier.getCharacteristics()
.getNumericalCharacteristics
in class AbstractClassifier
Exception
- if some of the characteristics could not be definedpublic String getInstanceName()
AbstractClassifier
getInstanceName
in class AbstractClassifier
protected abstract Function getFunction(DataSet[] data, double[][] weights) throws Exception
data
- the samplesweights
- the weights of the sequences of the samplesException
- if the function could not be createdprotected double modifyFunctionValue(double value)
getFunction(DataSet[], double[][])
.
This is for instance necessary in case of LogGenDisMixFunction
to
obtain a proper posterior or supervised posterior.value
- the original valueprotected SamplingScoreBasedClassifier.DiffSMSamplingComponent getSamplingComponent()
SamplingScoreBasedClassifier
public File getTempDir()
SamplingScoreBasedClassifier
.
If this value is null
, the default directory of the executing OS is used for the parameter
files.public void setTempDir(File tempDir)
SamplingScoreBasedClassifier
.
If tempDir
is null
, the default directory of the executing OS is used for the parameter
files. If this value is reset after training, all sampled parameters will be lost.
The value set by this method is not stored in the XML-representation.tempDir
- the temp directorypublic boolean getDeleteOnExit()
true
if the temporary parameter files shall
be deleted on exit of the program.public void setDeleteOnExit(boolean deleteOnExit) throws Exception
true
(which is the default), the temporary files for storing sampled parameter
values are deleted on exit of the program. If this value is set to true
it cannot be
reset to false
, again, after sampling started due to the restrictions of File.deleteOnExit()
.
If you want to retain those
parameters, nonetheless, you can call AbstractClassifier.toXML()
and save this StringBuffer
, which also contains the sampled
parameter values, somewhere.
The value set by this method is not stored in the XML-representation.deleteOnExit
- if temp files shall be deleted on exitException
- if set to false
after sampling startedprotected void init(int starts, boolean adaptVariance, String outfilePrefix) throws Exception
scoringFunctions
s randomlystarts
- number of startsadaptVariance
- if true, variance is adapted to size of event spaceoutfilePrefix
- the prefix of the outfilesException
- if the scoring functions could not be initializedprotected double sampleNSteps(Function function, SamplingScoreBasedClassifier.DiffSMSamplingComponent component, BurnInTest test, int numSteps, SamplingScoreBasedClassifier.SamplingScheme scheme) throws Exception
function
- the objective functioncomponent
- the sampling component with selected samplingtest
- the burn-in testnumSteps
- the number of stepsscheme
- the SamplingScoreBasedClassifier.SamplingScheme
Exception
- if either the function could not be evaluated on the current parameters or the
sampled parameters could not be storedprotected void sample(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc, Function function) throws Exception
burnInTest
and then samples the number of
stationary steps as set in params
.sfsc
- the current sampling componentfunction
- the objective functionException
- if the sampling could not be extended, e.g. due to evaluation errorsprotected double doOneSamplingStep(Function function, SamplingScoreBasedClassifier.SamplingScheme scheme, double previousValue) throws Exception
function
- the objective functionscheme
- the SamplingScoreBasedClassifier.SamplingScheme
previousValue
- the value of the last sampling or minus infinity
for the first sampling runDouble.NaN
if none
of the sampled parameters where acceptedException
- if the function could not be evaluated or an unknown SamplingScoreBasedClassifier.SamplingScheme
was providedprotected double getScore(Sequence seq, int cls, boolean check) throws IllegalArgumentException, NotTrainedException, Exception
AbstractScoreBasedClassifier
Sequence
and a given
class.getScore
in class AbstractScoreBasedClassifier
seq
- the Sequence
cls
- the index of the classcheck
- the switch to decide whether to check
AlphabetContainer
and the length of the
Sequence
or notSequence
and a given classIllegalArgumentException
- if something is wrong with the Sequence
seq
NotTrainedException
- if the classifier is not trainedException
- if something went wrongpublic double[] getScores(DataSet s) throws Exception
AbstractScoreBasedClassifier
Sequence
in the DataSet
. The scores are stored in the array according to
the index of the Sequence
in the DataSet
.
getScores
in class AbstractScoreBasedClassifier
s
- the DataSet
Exception
- if something went wrongpublic void setInitParameters(double[] parameters)
parameters
.parameters
- the initial parametersprotected void setParameters(double[] currentParameters)
currentParameters
- the new parameter valuespublic boolean isInitialized()
AbstractClassifier
isInitialized
in class AbstractClassifier
true
if the classifier is initialized and therefore able
to classify sequences, otherwise false
public void doSingleSampling(DataSet[] s, double[][] weights, int numSteps, String outfilePrefix) throws Exception
s
- the dataweights
- the weights for the datanumSteps
- the number of sampling stepsoutfilePrefix
- the prefix of the outfile where the parameter values
are storedException
- if the scoring functions could not be initialized or the sampling could not be extended, e.g. due to evaluation errorsprotected int getNumberOfParameters()
SamplingDifferentiableStatisticalModel
s.public void train(DataSet[] s, double[][] weights) throws Exception
AbstractClassifier
DataSet
s. That is why the following has to be fulfilled:
s.length == weights.length
weights[i] == null || s[i].getNumberOfElements() == weights[i].length
.
AbstractClassifier.train(DataSet...)
.
DataSet
s are defined over the
underlying alphabet and length.train
in class AbstractClassifier
s
- an array of DataSet
sweights
- the weights for the DataSet
sException
- if the weights are incorrect or the training did not succeedAbstractClassifier.train(DataSet...)
protected void precomputeBurnInLength(SamplingScoreBasedClassifier.DiffSMSamplingComponent sfsc) throws Exception
sfsc
- the current sampling componentException
- if the parameters values could not be parsedpublic double[] getBestParameters() throws Exception
Exception
- if the parameters values could not be parsedprotected double[] getMeanParameters(boolean testBurnIn, int minBurnInSteps) throws Exception
testBurnIn
- true if the length of the burn-in phase shall be computedminBurnInSteps
- minimum number of steps considered as burn-inException
- if the parameters values could not be parsedpublic void joinAndSetParameterFiles(boolean add, File... files) throws Exception
SamplingScoreBasedClassifier
add
- if true
, parameter files are appended to the current ones, i.e., the number
of samplings is augmented by these filesfiles
- the parameter filesException
- if the parameter files could not be joined