|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.motifDiscovery.KMereStatistic
public final class KMereStatistic
This class enables the user to get some statistics of a DataSet
in an easy way.
Constructor Summary | |
---|---|
KMereStatistic(DataSet data,
int k)
This constructor creates an internal statistic counting all k -mers in the data . |
Method Summary | |
---|---|
static DataSet.WeightedDataSetFactory |
getAbsoluteKMereFrequencies(DataSet data,
int k,
boolean bothStrands)
This method enables the user to get a statistic over all k -mers
in the data . |
static DataSet.WeightedDataSetFactory |
getAbsoluteKMereFrequencies(DataSet data,
int k,
boolean bothStrands,
DataSet.WeightedDataSetFactory.SortOperation sortOp)
This method enables the user to get a statistic over all k -mers
in the data . |
static Sequence[] |
getCommonString(DataSet data,
int motifLength,
boolean bothStrands)
This method returns an array of strings of length motifLength so that each String is contained in all
sequences of the sample respectively in the sample and the reverse
complementary sample. |
static LinkedList<Sequence> |
getConservedPatterns(Hashtable<Sequence,BitSet[]> statistic,
int dataSetIndex,
int threshold)
This method returns a list of Sequence s. |
static Pair<Sequence,BitSet[]>[] |
getKmereSequenceStatistic(boolean bothStrands,
int maxMismatch,
HashSet<Sequence> filter,
DataSet... data)
This method enables the user to get a statistic for a set of k -mers. |
static Hashtable<Sequence,BitSet[]> |
getKmereSequenceStatistic(int k,
boolean bothStrands,
int addIndex,
DataSet... data)
This method enables the user to get a statistic over all k -mers
in the sequences. |
double[][] |
getSmoothedProfile(int window,
Sequence... seq)
This method returns an array of smoothed profiles. |
double[][] |
getSmoothedProfile(int window,
String... kmere)
This method returns an array of smoothed profiles. |
static Hashtable<Sequence,BitSet[]> |
merge(Hashtable<Sequence,BitSet[]> statistic,
int maximalMissmatch,
boolean bothStrands)
This method allows to merge the statistics of k-mers by allowing mismatches. |
static Hashtable<Sequence,BitSet[]> |
removeBackground(Hashtable<Sequence,BitSet[]> statistic,
int fgIndex,
int bgIndex,
double fgWeight,
double bgWeight)
This method allows to remove those entries from the statistic that have a lower weighted foreground cardinality than the weighted background cardinality. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KMereStatistic(DataSet data, int k)
k
-mers in the data
.
data
- the datak
- the number of symbols in each counted wordMethod Detail |
---|
public double[][] getSmoothedProfile(int window, String... kmere)
window
- the window length, for no smoothing use 1kmere
- the k-mere
getSmoothedProfile(int, Sequence...)
,
Sequence.create(AlphabetContainer, String)
public double[][] getSmoothedProfile(int window, Sequence... seq)
window
- the window length, for no smoothing use 1seq
- the Sequence
instances containing the k-meres
public static Sequence[] getCommonString(DataSet data, int motifLength, boolean bothStrands) throws Exception
motifLength
so that each String is contained in all
sequences of the sample respectively in the sample and the reverse
complementary sample.
data
- the sample of sequencesmotifLength
- the motif lengthbothStrands
- the switch for using both strand true
or only
forward strand false
motifLength
so that
each String is contained in data
respectively on
one strand of the data
Exception
- if something went wrongpublic static DataSet.WeightedDataSetFactory getAbsoluteKMereFrequencies(DataSet data, int k, boolean bothStrands) throws Exception
k
-mers
in the data
. That is it counts the outcome of each
k
-mere in the complete data
.
data
- the sample of sequencesk
- the motif lengthbothStrands
- the switch for using both strand true
or only
forward strand false
. If true
for each k
-mer only this k
-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory
.
DataSet.WeightedDataSetFactory
containing all k
-mers and
their absolute frequencies in data
respectively on
one strand of the data
Exception
- if something went wronggetAbsoluteKMereFrequencies(DataSet, int, boolean, DataSet.WeightedDataSetFactory.SortOperation)
,
DataSet.WeightedDataSetFactory.SortOperation.NO_SORT
public static DataSet.WeightedDataSetFactory getAbsoluteKMereFrequencies(DataSet data, int k, boolean bothStrands, DataSet.WeightedDataSetFactory.SortOperation sortOp) throws Exception
k
-mers
in the data
. That is it counts the outcome of each
k
-mere in the complete data
.
data
- the sample of sequencesk
- the motif lengthbothStrands
- the switch for using both strand true
or only
forward strand false
. If true
for each k
-mer only this k
-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory
.sortOp
- the way how the result should be sorted
DataSet.WeightedDataSetFactory
containing all k
-mers and
their absolute frequencies in data
respectively on
one strand of the data
Exception
- if something went wrongpublic static Hashtable<Sequence,BitSet[]> getKmereSequenceStatistic(int k, boolean bothStrands, int addIndex, DataSet... data) throws WrongAlphabetException, OperationNotSupportedException
k
-mers
in the sequences. That is, it creates for each occurring k
-mer an array
of BitSet
s indicating for each data set and each sequence whether it contains
the k
-mer (or its reverse complement) or not.
data
- the DataSet
s of Sequence
sk
- the motif lengthbothStrands
- the switch for using both strand true
or only
forward strand false
. If true
for each k
-mer only this k
-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory
.addIndex
- the maximal index for inserting new k-meres
Hashtable
on Sequence
s and arrays of BitSet
s; each
entry encodes a k
-mer and the occurrence of this k
-mer
in each data set and sequence; if a k
-mer occurs in data set
d
in sequence n
the n
-th bit of the
d
-th BitSet
is true.
WrongAlphabetException
- if the AlphabetContainer
s of the DataSet
s do not match or if they are not simple and discrete
OperationNotSupportedException
- if the bothStrands==true
but the reverse complement could not be computedHashtable
,
merge(Hashtable, int, boolean)
public static Pair<Sequence,BitSet[]>[] getKmereSequenceStatistic(boolean bothStrands, int maxMismatch, HashSet<Sequence> filter, DataSet... data) throws WrongAlphabetException, OperationNotSupportedException
k
-mers.
That is, it creates for each k
-mer from filter
an array
of BitSet
s indicating for each data set and each sequence whether it contains
the k
-mer (or its reverse complement) or not.
bothStrands
- the switch for using both strand true
or only
forward strand false
. If true
for each k
-mer only this k
-mere
or its reverse complement is contained in the returned
DataSet.WeightedDataSetFactory
.maxMismatch
- the maximal number of mismatchesfilter
- a filter containing all interesting k
-mersdata
- the DataSet
s of Sequence
s
Hashtable
on Sequence
s and arrays of BitSet
s; each
entry encodes a k
-mer and the occurrence of this k
-mer
in each data set and sequence; if a k
-mer occurs in data set
d
in sequence n
the n
-th bit of the
d
-th BitSet
is true.
WrongAlphabetException
- if the AlphabetContainer
s of the DataSet
s do not match or if they are not simple and discrete
OperationNotSupportedException
- if the bothStrands==true
but the reverse complement could not be computedHashtable
,
merge(Hashtable, int, boolean)
public static Hashtable<Sequence,BitSet[]> merge(Hashtable<Sequence,BitSet[]> statistic, int maximalMissmatch, boolean bothStrands) throws OperationNotSupportedException, CloneNotSupportedException, WrongLengthException, WrongAlphabetException
statistic
- a statistic as obtained from getKmereSequenceStatistic(int, boolean, int, DataSet...)
maximalMissmatch
- the maximal number of allowed mismatchesbothStrands
- the switch for using both strand true
or only forward strand false
.
OperationNotSupportedException
- if the bothStrands==true
but the reverse complement could not be computed
CloneNotSupportedException
- if an array of BitSet
can not be cloned
WrongAlphabetException
- see Sequence.getHammingDistance(Sequence)
WrongLengthException
- see Sequence.getHammingDistance(Sequence)
Sequence.getHammingDistance(Sequence)
,
getKmereSequenceStatistic(int, boolean, int, DataSet...)
public static LinkedList<Sequence> getConservedPatterns(Hashtable<Sequence,BitSet[]> statistic, int dataSetIndex, int threshold)
Sequence
s. Each entry corresponds to a sequence
or a set of sequences (depending on the input of the statistic
) that occurs
in more than threshold
Sequence
s of the data set.
statistic
- a statistic as obtained from getKmereSequenceStatistic(int, boolean, int, DataSet...)
or merge(Hashtable, int, boolean)
dataSetIndex
- the index of the BitSet
to be usedthreshold
- a threshold that has to be exceeded by BitSet.cardinality()
to be declared as a conserved pattern
getKmereSequenceStatistic(int, boolean, int, DataSet...)
,
merge(Hashtable, int, boolean)
public static Hashtable<Sequence,BitSet[]> removeBackground(Hashtable<Sequence,BitSet[]> statistic, int fgIndex, int bgIndex, double fgWeight, double bgWeight)
statistic
- a statistic as obtained from getKmereSequenceStatistic(int, boolean, int, DataSet...)
or merge(Hashtable, int, boolean)
fgIndex
- the foreground index of the BitSet
to be usedbgIndex
- the background index of the BitSet
to be usedfgWeight
- the weight used to weight the foreground cardinalitybgWeight
- the weight used to weight the background cardinality
Hashtable
containing only the positive entries
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |