SignificantMotifOccurrencesFinder

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES All Classes

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

de.jstacs.motifDiscovery
Class SignificantMotifOccurrencesFinder

java.lang.Object
  de.jstacs.motifDiscovery.SignificantMotifOccurrencesFinder

public class SignificantMotifOccurrencesFinder
extends Object
extends Object

This class enables the user to predict motif occurrences given a specific significance level.

Author:: Jan Grau, Jens Keilwagen

Nested Class Summary
`static interface`	`SignificantMotifOccurrencesFinder.JoinMethod` Interface for methods that combine several profiles over the same sequence into one common profile
`static class`	`SignificantMotifOccurrencesFinder.RandomSeqType`
`static class`	`SignificantMotifOccurrencesFinder.SumOfProbabilities` Joins several profiles containing log-probabilities into one profile containing the logarithm of the sum of the probabilities of the single profiles.

Constructor Summary
`SignificantMotifOccurrencesFinder(MotifDiscoverer disc, DataSet bg, double[] weights, double sign)` This constructor creates an instance of `SignificantMotifOccurrencesFinder` that uses a `DataSet` to determine the siginificance level.
`SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.JoinMethod joiner, DataSet bg, double[] weights, double sign)` This constructor creates an instance of `SignificantMotifOccurrencesFinder` that uses a `DataSet` to determine the siginificance level.
`SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, boolean oneHistogram, int numSequences, double sign)` This constructor creates an instance of `SignificantMotifOccurrencesFinder` that uses the given `SignificantMotifOccurrencesFinder.RandomSeqType` to determine the siginificance level.
`SignificantMotifOccurrencesFinder(MotifDiscoverer disc, SignificantMotifOccurrencesFinder.RandomSeqType type, SignificantMotifOccurrencesFinder.JoinMethod joiner, boolean oneHistogram, int numSequences, double sign)` This constructor creates an instance of `SignificantMotifOccurrencesFinder` that uses the given `SignificantMotifOccurrencesFinder.RandomSeqType` to determine the siginificance level.

Method Summary
`DataSet`	`annotateMotif(DataSet data, int motifIndex)` This method annotates a `DataSet`.
`DataSet`	`annotateMotif(DataSet data, int motifIndex, int addMax)` This method annotates a `DataSet`.
`DataSet`	`annotateMotif(int startPos, DataSet data, int motifIndex)` This method annotates a `DataSet` starting in each sequence at `startPos`.
`DataSet`	`annotateMotif(int startPos, DataSet data, int motifIndex, int addMax, boolean addAnnotation)` This method annotates a `DataSet` starting in each sequence at `startPos`.
`MotifAnnotation[]`	`findSignificantMotifOccurrences(int motif, Sequence seq, int start)` This method finds the significant motif occurrences in the sequence.
`MotifAnnotation[]`	`findSignificantMotifOccurrences(int motif, Sequence seq, int addMax, int start)` This method finds the significant motif occurrences in the sequence.
`DataSet`	`getBindingSites(DataSet data, int motifIndex)` This method returns a `DataSet` containing the predicted binding sites.
`DataSet`	`getBindingSites(int startPos, DataSet data, int motifIndex, int addMax, int addLeft, int addRight)` This method returns a `DataSet` containing the predicted binding sites.
`double`	`getFactorForAucPR()` This method returns a factor that must be multiplied to scores for computing PR curves.
`MotifDiscoverer`	`getMotifDiscoverer()` This method returns a clone of the internally used `MotifDiscoverer`.
`double`	`getNumberOfBoundSequences(DataSet data, double[] weights, int motifIndex)` Returns the number of sequences in `data` that are predicted to be bound at least once by motif no.
`double`	`getOffsetForAucPR()` This method returns an offset that must be added to scores for computing PR curves.
`double[][]`	`getPWM(int motif, DataSet data, double[] weights, int addLeft, int addRight)` Returns the Position weight matrix (PWM) of the binding sites of motif `motif` in the data set `data` of the `MotifDiscoverer` of this `SignificantMotifOccurrencesFinder`.
`Pair<double[][],double[]>`	`getPWMAndPosDist(int motif, DataSet data, double[] weights, double[] mean, int addLeft, int addRight)` Returns the Position weight matrix (PWM) of the binding sites of motif `motif` in the data set `data` of the `MotifDiscoverer` of this `SignificantMotifOccurrencesFinder` together with standard deviation of binding site positions computed using the provided `mean` values for each sequence.
`Pair<double[][][],int[][]>`	`getPWMAndPositions(int motif, DataSet data, double[] weights, int addLeft, int addRight)` Returns the Position weight matrix (PWM) of the binding sites of motif `motif` in the data set `data` of the `MotifDiscoverer` of this `SignificantMotifOccurrencesFinder` together with the positions of the binding sites within the sequences of `data` and the corresponding p-values.
`protected double[][]`	`getPWMAndPositions(int motif, DataSet data, double[] weights, int addLeft, int addRight, int[][] positions, double[][] pvals, double[] mean, double[] sd)` Returns the Position weight matrix (PWM) of the binding sites of motif `motif` in the data set `data` of the `MotifDiscoverer` of this `SignificantMotifOccurrencesFinder` and fills with the positions of the binding sites within the sequences of `data` and the corresponding p-values into the corresponding arrays.
`IntList`	`getStartPositions(int startPos, DataSet data, int motifIndex, int addMax)` This method returns a list of start positions of binding sites.
`double[][]`	`getValuesForEachNucleotide(DataSet data, int motif, boolean addOnlyBest)` This method determines a score for each possible starting position in each of the sequences in `data` that this position is covered by at least one motif occurrence of the motif with index `index`.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

SignificantMotifOccurrencesFinder

public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.RandomSeqType type,
                                         boolean oneHistogram,
                                         int numSequences,
                                         double sign)

This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.

Parameters:: disc - the MotifDiscoverer for the prediction; type - the type that determines how the significance level is determined; oneHistogram - a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histograms; numSequences - the number of sampled sequence instances used to determine the significance level; sign - the significance level

SignificantMotifOccurrencesFinder

public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.RandomSeqType type,
                                         SignificantMotifOccurrencesFinder.JoinMethod joiner,
                                         boolean oneHistogram,
                                         int numSequences,
                                         double sign)

This constructor creates an instance of SignificantMotifOccurrencesFinder that uses the given SignificantMotifOccurrencesFinder.RandomSeqType to determine the siginificance level.

Parameters:: disc - the MotifDiscoverer for the prediction; type - the type that determines how the significance level is determined; joiner - the SignificantMotifOccurrencesFinder.JoinMethod that defines how the profiles of the same motif in different components shall be joined; oneHistogram - a switch to decide whether to use one background distribution histogram for all sequence or sequence specific background distribution histograms; numSequences - the number of sampled sequence instances used to determine the significance level; sign - the significance level

SignificantMotifOccurrencesFinder

public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         DataSet bg,
                                         double[] weights,
                                         double sign)

This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level.

Parameters:: disc - the MotifDiscoverer for the prediction; bg - the background data set; weights - the weights of the background data set, can be null; sign - the significance level

SignificantMotifOccurrencesFinder

public SignificantMotifOccurrencesFinder(MotifDiscoverer disc,
                                         SignificantMotifOccurrencesFinder.JoinMethod joiner,
                                         DataSet bg,
                                         double[] weights,
                                         double sign)

This constructor creates an instance of SignificantMotifOccurrencesFinder that uses a DataSet to determine the siginificance level.

Parameters:: disc - the MotifDiscoverer for the prediction; joiner - the SignificantMotifOccurrencesFinder.JoinMethod that defines how the profiles of the same motif in different components shall be joined; bg - the background data set; weights - the weights of the background data set, can be null; sign - the significance level

Method Detail

findSignificantMotifOccurrences

public MotifAnnotation[] findSignificantMotifOccurrences(int motif,
                                                         Sequence seq,
                                                         int start)
                                                  throws Exception

This method finds the significant motif occurrences in the sequence.

Parameters:: motif - the motif index; seq - the sequence; start - the start position
Returns:: an array of MotifAnnotation for the sequence
Throws:: Exception - if the background sample could not be created, or some of the scores could not be computed

findSignificantMotifOccurrences

public MotifAnnotation[] findSignificantMotifOccurrences(int motif,
                                                         Sequence seq,
                                                         int addMax,
                                                         int start)
                                                  throws Exception

This method finds the significant motif occurrences in the sequence.

Parameters:: motif - the motif index; seq - the sequence; addMax - the number of motif occurrences that can at most be annotated; start - the start position
Returns:: an array of MotifAnnotation for the sequence
Throws:: Exception - if the background sample could not be created, or some of the scores could not be computed

getPWM

public double[][] getPWM(int motif,
                         DataSet data,
                         double[] weights,
                         int addLeft,
                         int addRight)
                  throws Exception

Returns the Position weight matrix (PWM) of the binding sites of motif motif in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder.

Parameters:: motif - the index of the motif; data - the data set; weights - the weights on the individual sequences of the data set; addLeft - the number of positions to add to the left side of motif occurrences; addRight - the number of positions to add to the right side of motif occurrences
Returns:: the PWM
Throws:: Exception - if something went wrong

getPWMAndPositions

public Pair<double[][][],int[][]> getPWMAndPositions(int motif,
                                                     DataSet data,
                                                     double[] weights,
                                                     int addLeft,
                                                     int addRight)
                                              throws Exception

Returns the Position weight matrix (PWM) of the binding sites of motif motif in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder together with the positions of the binding sites within the sequences of data and the corresponding p-values.

Parameters:: motif - the index of the motif; data - the data set; weights - the weights on the individual sequences of the data set; addLeft - the number of positions to add to the left side of motif occurrences; addRight - the number of positions to add to the right side of motif occurrences
Returns:: a Pair containing the PWM at index 0 of the first element, the p-values at index 1 of the first element, and the positions as second element. P-value and position arrays are indexed by the index of the corresponding sequence in data.
Throws:: Exception - if something went wrong

getPWMAndPositions

protected double[][] getPWMAndPositions(int motif,
                                        DataSet data,
                                        double[] weights,
                                        int addLeft,
                                        int addRight,
                                        int[][] positions,
                                        double[][] pvals,
                                        double[] mean,
                                        double[] sd)
                                 throws Exception

Returns the Position weight matrix (PWM) of the binding sites of motif motif in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder and fills with the positions of the binding sites within the sequences of data and the corresponding p-values into the corresponding arrays. Additionally, the standard deviation of binding site positions is computed using the provided mean values for each sequence.

Parameters:: motif - the index of the motif; data - the data set; weights - the weights on the individual sequences of the data set; addLeft - the number of positions to add to the left side of motif occurrences; addRight - the number of positions to add to the right side of motif occurrences; positions - the array filled with the positions. First dimension must contain as many entries as data has sequences. May be null.; pvals - the array filled with the p-values. First dimension must contain as many entries as data has sequences. May be null.; mean - the means for the individual sequences. May be null.; sd - array of lenght 1 filled with the determined standard deviation. May be null.
Returns:: the PWM
Throws:: Exception - if something went wrong

getPWMAndPosDist

public Pair<double[][],double[]> getPWMAndPosDist(int motif,
                                                  DataSet data,
                                                  double[] weights,
                                                  double[] mean,
                                                  int addLeft,
                                                  int addRight)
                                           throws Exception

Returns the Position weight matrix (PWM) of the binding sites of motif motif in the data set data of the MotifDiscoverer of this SignificantMotifOccurrencesFinder together with standard deviation of binding site positions computed using the provided mean values for each sequence.

Parameters:: motif - the index of the motif; data - the data set; weights - the weights on the individual sequences of the data set; mean - the means for the individual sequences. Must be as long as data has sequences.; addLeft - the number of positions to add to the left side of motif occurrences; addRight - the number of positions to add to the right side of motif occurrences
Returns:: a Pair containing the PWM as first element and standard deviation in array of length 1 as second element.
Throws:: Exception - if something went wrong

annotateMotif

public DataSet annotateMotif(DataSet data,
                             int motifIndex)
                      throws Exception

This method annotates a DataSet.

Parameters:: data - the DataSet; motifIndex - the index of the motif
Returns:: an annotated DataSet
Throws:: Exception - if something went wrong
See Also:: annotateMotif(int, DataSet, int)

annotateMotif

public DataSet annotateMotif(int startPos,
                             DataSet data,
                             int motifIndex)
                      throws Exception

This method annotates a DataSet starting in each sequence at startPos.

Parameters:: startPos - the start position used for all sequences; data - the DataSet; motifIndex - the index of the motif
Returns:: an annotated DataSet
Throws:: Exception - if something went wrong
See Also:: annotateMotif(int, DataSet, int)

annotateMotif

public DataSet annotateMotif(DataSet data,
                             int motifIndex,
                             int addMax)
                      throws Exception

This method annotates a DataSet. At most, addMax motif occurrences of the motif instance will be annotated.

Parameters:: data - the DataSet; motifIndex - the index of the motif; addMax - the number of motif occurrences that can at most be annotated for each motif instance
Returns:: an annotated DataSet
Throws:: Exception - if something went wrong
See Also:: annotateMotif(int, DataSet, int)

annotateMotif

public DataSet annotateMotif(int startPos,
                             DataSet data,
                             int motifIndex,
                             int addMax,
                             boolean addAnnotation)
                      throws Exception

This method annotates a DataSet starting in each sequence at startPos. At most, addMax motif occurrences of the motif instance will be annotated.

Parameters:: startPos - the start position used for all sequences; data - the DataSet; motifIndex - the index of the motif; addMax - the number of motif occurrences that can at most be annotated for each motif instance; addAnnotation - a switch whether to add or replace the current annotation
Returns:: an annotated DataSet
Throws:: Exception - if something went wrong
See Also:: annotateMotif(int, DataSet, int)

getBindingSites

public DataSet getBindingSites(DataSet data,
                               int motifIndex)
                        throws Exception

This method returns a DataSet containing the predicted binding sites.

Parameters:: data - the DataSet; motifIndex - the index of the motif
Returns:: a DataSet containing the predicted binding sites
Throws:: Exception - if something went wrong

getBindingSites

public DataSet getBindingSites(int startPos,
                               DataSet data,
                               int motifIndex,
                               int addMax,
                               int addLeft,
                               int addRight)
                        throws Exception

This method returns a DataSet containing the predicted binding sites.

Parameters:: startPos - the start position used for all sequences; data - the DataSet; motifIndex - the index of the motif; addMax - the number of motif occurrences that can at most be annotated for each motif instance; addLeft - number of positions added to the left of the predicted motif occurrence; addRight - number of positions added to the right of the predicted motif occurrence
Returns:: a DataSet containing the predicted binding sites
Throws:: Exception - if something went wrong

getStartPositions

public IntList getStartPositions(int startPos,
                                 DataSet data,
                                 int motifIndex,
                                 int addMax)
                          throws Exception

This method returns a list of start positions of binding sites.

Parameters:: startPos - the start position used for all sequences; data - the DataSet; motifIndex - the index of the motif; addMax - the number of motif occurrences that can at most be annotated for each motif instance
Returns:: a list of start positions
Throws:: Exception - if something went wrong

getNumberOfBoundSequences

public double getNumberOfBoundSequences(DataSet data,
                                        double[] weights,
                                        int motifIndex)
                                 throws Exception

Returns the number of sequences in data that are predicted to be bound at least once by motif no. motif.

Parameters:: data - the data; weights - the weights of the data; motifIndex - the index of the motif
Returns:: the number of sequences in data bound by motif motif
Throws:: Exception - if the background sample for the prediction could not be created or some of the scores could not be computed

getOffsetForAucPR

public double getOffsetForAucPR()

This method returns an offset that must be added to scores for computing PR curves. If this SignificantMotifOccurrencesFinder was instantiated using oneHistogram=true, the getValuesForEachNucleotide(DataSet, int, boolean) returns scores and no offset is needed. Otherwise, it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the offset is 1.

Returns:: the offset
See Also:: MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)

getFactorForAucPR

public double getFactorForAucPR()

This method returns a factor that must be multiplied to scores for computing PR curves. If this SignificantMotifOccurrencesFinder was instantiated using oneHistogram=true, the getValuesForEachNucleotide(DataSet, int, boolean) returns scores and a factor of 1 is appropriate. Otherwise, it returns p-values and, hence, 1-(p-value) must be used for the PR curve and the factor is -1.

Returns:: the factor
See Also:: MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String)

getValuesForEachNucleotide

public double[][] getValuesForEachNucleotide(DataSet data,
                                             int motif,
                                             boolean addOnlyBest)
                                      throws Exception

This method determines a score for each possible starting position in each of the sequences in data that this position is covered by at least one motif occurrence of the motif with index index. If the SignificantMotifOccurrencesFinder was constructed using oneHistogram=true the returned values are arbitrary scores, and p-values otherwise.

Parameters:: data - the DataSet; motif - the motif index; addOnlyBest - a switch whether to add only the best
Returns:: an array containing for each sequence an array with the scores for each starting position in the sequence
Throws:: Exception - if something went wrong during the computation of the scores of the MotifDiscoverer
See Also:: MotifDiscoveryAssessment.getSortedValuesForMotifAndFlanking(DataSet, double[][], double, double, String), getOffsetForAucPR(), getFactorForAucPR()

getMotifDiscoverer

public MotifDiscoverer getMotifDiscoverer()
                                   throws CloneNotSupportedException

This method returns a clone of the internally used MotifDiscoverer.

Returns:: clone of the internally used MotifDiscoverer
Throws:: CloneNotSupportedException - if the MotifDiscoverer can not be cloned correctly

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES All Classes

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

de.jstacs.motifDiscovery Class SignificantMotifOccurrencesFinder

SignificantMotifOccurrencesFinder

SignificantMotifOccurrencesFinder

SignificantMotifOccurrencesFinder

SignificantMotifOccurrencesFinder

findSignificantMotifOccurrences

findSignificantMotifOccurrences

getPWM

getPWMAndPositions

getPWMAndPositions

getPWMAndPosDist

annotateMotif

annotateMotif

annotateMotif

annotateMotif

getBindingSites

getBindingSites

getStartPositions

getNumberOfBoundSequences

getOffsetForAucPR

getFactorForAucPR

getValuesForEachNucleotide

getMotifDiscoverer

de.jstacs.motifDiscovery
Class SignificantMotifOccurrencesFinder