|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.motifDiscovery.SignificantMotifOccurrencesFinder
public class SignificantMotifOccurrencesFinder
This class enables the user to predict motif occurrences given a specific significance level.
Nested Class Summary | |
---|---|
static interface |
SignificantMotifOccurrencesFinder.JoinMethod
Interface for methods that combine several profiles over the same sequence into one common profile |
static class |
SignificantMotifOccurrencesFinder.RandomSeqType
|
static class |
SignificantMotifOccurrencesFinder.SumOfProbabilities
Joins several profiles containing log-probabilities into one profile containing the logarithm of the sum of the probabilities of the single profiles. |
Constructor Summary | |
---|---|
SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
DataSet bg,
double[] weights,
double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level. |
|
SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
SignificantMotifOccurrencesFinder.JoinMethod joiner,
DataSet bg,
double[] weights,
double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level. |
|
SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
SignificantMotifOccurrencesFinder.RandomSeqType type,
boolean oneHistogram,
int numSequences,
double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level. |
|
SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
SignificantMotifOccurrencesFinder.RandomSeqType type,
SignificantMotifOccurrencesFinder.JoinMethod joiner,
boolean oneHistogram,
int numSequences,
double sign)
This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level. |
Method Summary | |
---|---|
DataSet |
annotateMotif(DataSet data,
int motifIndex)
This method annotates a DataSet . |
DataSet |
annotateMotif(DataSet data,
int motifIndex,
int addMax)
This method annotates a DataSet . |
DataSet |
annotateMotif(int startPos,
DataSet data,
int motifIndex)
This method annotates a DataSet starting in each sequence at startPos . |
DataSet |
annotateMotif(int startPos,
DataSet data,
int motifIndex,
int addMax,
boolean addAnnotation)
This method annotates a DataSet starting in each sequence at startPos . |
MotifAnnotation[] |
findSignificantMotifOccurrences(int motif,
Sequence seq,
int start)
This method finds the significant motif occurrences in the sequence. |
MotifAnnotation[] |
findSignificantMotifOccurrences(int motif,
Sequence seq,
int addMax,
int start)
This method finds the significant motif occurrences in the sequence. |
DataSet |
getBindingSites(DataSet data,
int motifIndex)
This method returns a DataSet containing the predicted binding sites. |
DataSet |
getBindingSites(int startPos,
DataSet data,
int motifIndex,
int addMax,
int addLeft,
int addRight)
This method returns a DataSet containing the predicted binding sites. |
double |
getFactorForAucPR()
This method returns a factor that must be multiplied to scores for computing PR curves. |
MotifDiscoverer |
getMotifDiscoverer()
This method returns a clone of the internally used MotifDiscoverer . |
double |
getNumberOfBoundSequences(DataSet data,
double[] weights,
int motifIndex)
Returns the number of sequences in data that are predicted to be bound at least once by motif no. |
double |
getOffsetForAucPR()
This method returns an offset that must be added to scores for computing PR curves. |
double[][] |
getPWM(int motif,
DataSet data,
double[] weights,
int addLeft,
int addRight)
Returns the Position weight matrix (PWM) of the binding sites of motif motif
in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder . |
Pair<double[][],double[]> |
getPWMAndPosDist(int motif,
DataSet data,
double[] weights,
double[] mean,
int addLeft,
int addRight)
Returns the Position weight matrix (PWM) of the binding sites of motif motif
in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder
together with standard deviation of binding site positions computed using the provided mean values for each sequence. |
Pair<double[][][],int[][]> |
getPWMAndPositions(int motif,
DataSet data,
double[] weights,
int addLeft,
int addRight)
Returns the Position weight matrix (PWM) of the binding sites of motif motif
in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder
together with the positions of the binding sites within the sequences of data and the corresponding p-values. |
protected double[][] |
getPWMAndPositions(int motif,
DataSet data,
double[] weights,
int addLeft,
int addRight,
int[][] positions,
double[][] pvals,
double[] mean,
double[] sd)
Returns the Position weight matrix (PWM) of the binding sites of motif motif
in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder
and fills with the positions of the binding sites within the sequences of data and the corresponding p-values into the corresponding arrays. |
IntList |
getStartPositions(int startPos,
DataSet data,
int motifIndex,
int addMax)
This method returns a list of start positions of binding sites. |
double[][] |
getValuesForEachNucleotide(DataSet data,
int motif,
boolean addOnlyBest)
This method determines a score for each possible starting position in each of the sequences in data
that this position is covered by at least one motif occurrence of the
motif with index index . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, boolean oneHistogram, int numSequences, double sign)
SignificantMotifOccurrencesFinder
that uses the given SignificantMotifOccurrencesFinder.RandomSeqType
to determine the siginificance level.
disc
- the MotifDiscoverer
for the predictiontype
- the type that determines how the significance level is determinedoneHistogram
- a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histogramsnumSequences
- the number of sampled sequence instances used to determine the significance levelsign
- the significance levelpublic SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, SignificantMotifOccurrencesFinder.JoinMethod joiner, boolean oneHistogram, int numSequences, double sign)
SignificantMotifOccurrencesFinder
that uses the given SignificantMotifOccurrencesFinder.RandomSeqType
to determine the siginificance level.
disc
- the MotifDiscoverer
for the predictiontype
- the type that determines how the significance level is determinedjoiner
- the SignificantMotifOccurrencesFinder.JoinMethod
that defines how the profiles of the same motif in different components shall be joinedoneHistogram
- a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histogramsnumSequences
- the number of sampled sequence instances used to determine the significance levelsign
- the significance levelpublic SignificantMotifOccurrencesFinder(MotifDiscoverer disc, DataSet bg, double[] weights, double sign)
SignificantMotifOccurrencesFinder
that uses a DataSet
to determine the siginificance level.
disc
- the MotifDiscoverer
for the predictionbg
- the background data setweights
- the weights of the background data set, can be null
sign
- the significance levelpublic SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.JoinMethod joiner, DataSet bg, double[] weights, double sign)
SignificantMotifOccurrencesFinder
that uses a DataSet
to determine the siginificance level.
disc
- the MotifDiscoverer
for the predictionjoiner
- the SignificantMotifOccurrencesFinder.JoinMethod
that defines how the profiles of the same motif in different components shall be joinedbg
- the background data setweights
- the weights of the background data set, can be null
sign
- the significance levelMethod Detail |
---|
public MotifAnnotation[] findSignificantMotifOccurrences(int motif, Sequence seq, int start) throws Exception
motif
- the motif indexseq
- the sequencestart
- the start position
MotifAnnotation
for the sequence
Exception
- if the background sample could not be created, or some of the scores could not be computedpublic MotifAnnotation[] findSignificantMotifOccurrences(int motif, Sequence seq, int addMax, int start) throws Exception
motif
- the motif indexseq
- the sequenceaddMax
- the number of motif occurrences that can at most be annotatedstart
- the start position
MotifAnnotation
for the sequence
Exception
- if the background sample could not be created, or some of the scores could not be computedpublic double[][] getPWM(int motif, DataSet data, double[] weights, int addLeft, int addRight) throws Exception
motif
in the data set data
of the MotifDiscoverer
of this SignificantMotifOccurrencesFinder
.
motif
- the index of the motifdata
- the data setweights
- the weights on the individual sequences of the data setaddLeft
- the number of positions to add to the left side of motif occurrencesaddRight
- the number of positions to add to the right side of motif occurrences
Exception
- if something went wrongpublic Pair<double[][][],int[][]> getPWMAndPositions(int motif, DataSet data, double[] weights, int addLeft, int addRight) throws Exception
motif
in the data set data
of the MotifDiscoverer
of this SignificantMotifOccurrencesFinder
together with the positions of the binding sites within the sequences of data
and the corresponding p-values.
motif
- the index of the motifdata
- the data setweights
- the weights on the individual sequences of the data setaddLeft
- the number of positions to add to the left side of motif occurrencesaddRight
- the number of positions to add to the right side of motif occurrences
Pair
containing the PWM at index 0 of the first element, the p-values at index 1 of the first element,
and the positions as second element. P-value and position arrays are indexed by the index of the corresponding sequence in data
.
Exception
- if something went wrongprotected double[][] getPWMAndPositions(int motif, DataSet data, double[] weights, int addLeft, int addRight, int[][] positions, double[][] pvals, double[] mean, double[] sd) throws Exception
motif
in the data set data
of the MotifDiscoverer
of this SignificantMotifOccurrencesFinder
and fills with the positions of the binding sites within the sequences of data
and the corresponding p-values into the corresponding arrays.
Additionally, the standard deviation of binding site positions is computed using the provided mean
values for each sequence.
motif
- the index of the motifdata
- the data setweights
- the weights on the individual sequences of the data setaddLeft
- the number of positions to add to the left side of motif occurrencesaddRight
- the number of positions to add to the right side of motif occurrencespositions
- the array filled with the positions. First dimension must contain as many entries as data
has sequences. May be null.pvals
- the array filled with the p-values. First dimension must contain as many entries as data
has sequences. May be null.mean
- the means for the individual sequences. May be null.sd
- array of lenght 1 filled with the determined standard deviation. May be null.
Exception
- if something went wrongpublic Pair<double[][],double[]> getPWMAndPosDist(int motif, DataSet data, double[] weights, double[] mean, int addLeft, int addRight) throws Exception
motif
in the data set data
of the MotifDiscoverer
of this SignificantMotifOccurrencesFinder
together with standard deviation of binding site positions computed using the provided mean
values for each sequence.
motif
- the index of the motifdata
- the data setweights
- the weights on the individual sequences of the data setmean
- the means for the individual sequences. Must be as long as data
has sequences.addLeft
- the number of positions to add to the left side of motif occurrencesaddRight
- the number of positions to add to the right side of motif occurrences
Pair
containing the PWM as first element and standard deviation in array of length 1 as second element.
Exception
- if something went wrongpublic DataSet annotateMotif(DataSet data, int motifIndex) throws Exception
DataSet
.
data
- the DataSet
motifIndex
- the index of the motif
DataSet
Exception
- if something went wrongannotateMotif(int, DataSet, int)
public DataSet annotateMotif(int startPos, DataSet data, int motifIndex) throws Exception
DataSet
starting in each sequence at startPos
.
startPos
- the start position used for all sequencesdata
- the DataSet
motifIndex
- the index of the motif
DataSet
Exception
- if something went wrongannotateMotif(int, DataSet, int)
public DataSet annotateMotif(DataSet data, int motifIndex, int addMax) throws Exception
DataSet
.
At most, addMax
motif occurrences of the motif instance will be annotated.
data
- the DataSet
motifIndex
- the index of the motifaddMax
- the number of motif occurrences that can at most be annotated for each motif instance
DataSet
Exception
- if something went wrongannotateMotif(int, DataSet, int)
public DataSet annotateMotif(int startPos, DataSet data, int motifIndex, int addMax, boolean addAnnotation) throws Exception
DataSet
starting in each sequence at startPos
.
At most, addMax
motif occurrences of the motif instance will be annotated.
startPos
- the start position used for all sequencesdata
- the DataSet
motifIndex
- the index of the motifaddMax
- the number of motif occurrences that can at most be annotated for each motif instanceaddAnnotation
- a switch whether to add or replace the current annotation
DataSet
Exception
- if something went wrongannotateMotif(int, DataSet, int)
public DataSet getBindingSites(DataSet data, int motifIndex) throws Exception
DataSet
containing the predicted binding sites.
data
- the DataSet
motifIndex
- the index of the motif
DataSet
containing the predicted binding sites
Exception
- if something went wrongpublic DataSet getBindingSites(int startPos, DataSet data, int motifIndex, int addMax, int addLeft, int addRight) throws Exception
DataSet
containing the predicted binding sites.
startPos
- the start position used for all sequencesdata
- the DataSet
motifIndex
- the index of the motifaddMax
- the number of motif occurrences that can at most be annotated for each motif instanceaddLeft
- number of positions added to the left of the predicted motif occurrenceaddRight
- number of positions added to the right of the predicted motif occurrence
DataSet
containing the predicted binding sites
Exception
- if something went wrongpublic IntList getStartPositions(int startPos, DataSet data, int motifIndex, int addMax) throws Exception
startPos
- the start position used for all sequencesdata
- the DataSet
motifIndex
- the index of the motifaddMax
- the number of motif occurrences that can at most be annotated for each motif instance
Exception
- if something went wrongpublic double getNumberOfBoundSequences(DataSet data, double[] weights, int motifIndex) throws Exception
data
that are predicted to be bound at least once by motif no. motif
.
data
- the dataweights
- the weights of the datamotifIndex
- the index of the motif
data
bound by motif motif
Exception
- if the background sample for the prediction could not be created or some of the scores could not be computedpublic double getOffsetForAucPR()
SignificantMotifOccurrencesFinder
was instantiated using oneHistogram=true
, the getValuesForEachNucleotide(DataSet, int, boolean)
returns scores and no offset is needed. Otherwise,
it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the offset is 1.
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)
public double getFactorForAucPR()
SignificantMotifOccurrencesFinder
was instantiated using oneHistogram=true
, the getValuesForEachNucleotide(DataSet, int, boolean)
returns scores and a factor of 1 is appropriate. Otherwise,
it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the factor is -1.
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)
public double[][] getValuesForEachNucleotide(DataSet data, int motif, boolean addOnlyBest) throws Exception
data
that this position is covered by at least one motif occurrence of the
motif with index index
. If the SignificantMotifOccurrencesFinder
was constructed using oneHistogram=true
the returned values are arbitrary scores, and p-values otherwise.
data
- the DataSet
motif
- the motif indexaddOnlyBest
- a switch whether to add only the best
Exception
- if something went wrong during the computation of the scores of the MotifDiscoverer
MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)
,
getOffsetForAucPR()
,
getFactorForAucPR()
public MotifDiscoverer getMotifDiscoverer() throws CloneNotSupportedException
MotifDiscoverer
.
MotifDiscoverer
CloneNotSupportedException
- if the MotifDiscoverer
can not be cloned correctly
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |