|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.jstacs.data.Sample
public class Sample
This is the class for any sample of Sequence
s. All Sequence
s
in a Sample
have to have the same AlphabetContainer
. The
Sequence
s may have different lengths.
For the internal representation the class Sequence
is used, where the
external alphabet is converted to integral numerical values. The class
Sample
knows about this coding via instances of class
AlphabetContainer
and accordingly Alphabet
.
There are different ways to access the elements of a
Sample
. If one needs random access there is the method
getElementAt(int)
. For fast sequential access it is recommended to
use an Sample.ElementEnumerator
.
Sample
is immutable.
AlphabetContainer
,
Alphabet
,
Sequence
Nested Class Summary | |
---|---|
static class |
Sample.ElementEnumerator
This class can be used to have a fast sequential access to a Sample . |
static class |
Sample.PartitionMethod
This enum defines different partition methods for a
Sample . |
static class |
Sample.WeightedSampleFactory
This class enables you to eliminate Sequence s that occur more
than once in one or more Sample s. |
Constructor Summary | |
---|---|
Sample(AlphabetContainer abc,
AbstractStringExtractor se)
Creates a new Sample from a StringExtractor
using the given AlphabetContainer . |
|
Sample(AlphabetContainer abc,
AbstractStringExtractor se,
int subsequenceLength)
Creates a new Sample from a StringExtractor
using the given AlphabetContainer and all overlapping windows of
length subsequenceLength . |
|
Sample(AlphabetContainer abc,
AbstractStringExtractor se,
String delim)
Creates a new Sample from a StringExtractor
using the given AlphabetContainer and a delimiter
delim . |
|
Sample(AlphabetContainer abc,
AbstractStringExtractor se,
String delim,
int subsequenceLength)
Creates a new Sample from a StringExtractor
using the given AlphabetContainer , the given delimiter
delim and all overlapping windows of length
subsequenceLength . |
|
Sample(Sample s,
int subsequenceLength)
Creates a new Sample from a given Sample and a given
length subsequenceLength . |
|
Sample(String annotation,
Sequence... seqs)
Creates a new Sample from an array of Sequence s and a
given annotation. |
Method Summary | |
---|---|
static Sample |
diff(Sample data,
Sample... samples)
This method computes the difference between the Sample data and
the Sample s samples . |
Sequence[] |
getAllElements()
Returns an array of Sequence s containing all elements of this
Sample . |
AlphabetContainer |
getAlphabetContainer()
Returns the AlphabetContainer of this Sample . |
String |
getAnnotation()
Returns some annotation of the Sample . |
static String |
getAnnotation(Sample... s)
Returns the annotation for an array of Sample s. |
Hashtable<String,HashSet<String>> |
getAnnotationTypesAndIdentifier()
This method returns all SequenceAnnotation types and the corresponding
identifier which occur in this Sample . |
Sample |
getCompositeSample(int[] starts,
int[] lengths)
This method enables you to use only composite Sequence s of all
elements in the current Sample . |
Sequence |
getElementAt(int i)
This method returns the element, i.e. the Sequence , with index
i . |
int |
getElementLength()
Returns the length of the elements, i.e. the Sequence s, in this
Sample . |
Sample |
getInfixSample(int start,
int length)
This method enables you to use only an infix of all elements, i.e. the Sequence s, in the current Sample . |
int |
getMaximalElementLength()
Returns the maximal length of an element, i.e. a Sequence , in
this Sample . |
int |
getMinimalElementLength()
Returns the minimal length of an element, i.e. a Sequence , in
this Sample . |
int |
getNumberOfElements()
Returns the number of elements, i.e. the Sequence s, in this
Sample . |
int |
getNumberOfElementsWithLength(int len)
Returns the number of overlapping elements that can be extracted. |
double |
getNumberOfElementsWithLength(int len,
double[] weights)
Returns the weighted number of overlapping elements that can be extracted. |
Sample |
getReverseComplementarySample()
Returns a Sample that contains the reverse complement of all Sequence s in
this Sample . |
int[][] |
getSequenceAnnotationIndexMatrix(String rowType,
Hashtable<String,Integer> rowHash,
String columnType,
Hashtable<String,Integer> columnHash)
This method creates a matrix which contains the index of the Sequence with specific SequenceAnnotation
combination or -1 if the Sample does not contain any Sequence with such a combination. |
Sample |
getSuffixSample(int start)
This method enables you to use only a suffix of all elements, i.e. the Sequence , in the current Sample . |
static Sample |
intersection(Sample... samples)
This method computes the intersection between all elements/ Sample
s of the array, i.e. it returns a Sample containing only
Sequence s that are contained in all Sample s of the array. |
boolean |
isDiscreteSample()
This method indicates if all positions use discrete values. |
boolean |
isSimpleSample()
This method indicates whether all random variables are defined over the same range, i.e. all positions use the same (fixed) alphabet. |
Iterator<Sequence> |
iterator()
|
Pair<Sample[],double[][]> |
partition(double[] sequenceWeights,
int k,
Sample.PartitionMethod method)
This method partitions the elements, i.e. the Sequence s, of the
Sample and the corresponding weights in k distinct parts. |
Pair<Sample[],double[][]> |
partition(double[] sequenceWeights,
Sample.PartitionMethod method,
double... percentage)
This method partitions the elements, i.e. the Sequence s, of the
Sample and the corresponding weights in distinct parts where each part holds the corresponding
percentage given in the array percentage . |
Sample[] |
partition(double p,
Sample.PartitionMethod method,
int subsequenceLength)
This method partitions the elements, i.e. the Sequence s, of the
Sample in two distinct parts. |
Sample[] |
partition(int k,
Sample.PartitionMethod method)
This method partitions the elements, i.e. the Sequence s, of the
Sample in k distinct parts. |
Sample[] |
partition(Sample.PartitionMethod method,
double... percentage)
This method partitions the elements, i.e. the Sequence s, of the
Sample in distinct parts where each part holds the corresponding
percentage given in the array percentage . |
void |
save(File f)
This method writes the Sample to a file f . |
void |
save(OutputStream stream,
char commentChar,
SequenceAnnotationParser p)
This method allows to write all Sequence s including their
SequenceAnnotation s into a OutputStream . |
Sample |
subSampling(int number)
Randomly samples elements, i.e. |
String |
toString()
|
static Sample |
union(Sample... s)
Unites all Sample s of the array s . |
static Sample |
union(Sample[] s,
boolean[] in)
This method unites all Sample s of the array s
regarding the array in . |
static Sample |
union(Sample[] s,
boolean[] in,
int subsequenceLength)
This method unites all Sample s of the array s
regarding the array in and sets the element length in the
united Sample to subsequenceLength . |
static Sample |
union(Sample[] s,
int subsequenceLength)
This method unites all Sample s of the array s and
sets the element length in the united sample to
subsequenceLength . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public Sample(AlphabetContainer abc, AbstractStringExtractor se) throws WrongAlphabetException, EmptySampleException, WrongLengthException
Sample
from a StringExtractor
using the given AlphabetContainer
.
abc
- the AlphabetContainer
se
- the StringExtractor
WrongAlphabetException
- if the AlphabetContainer
is not suitable
EmptySampleException
- if the Sample
would be empty
WrongLengthException
- never happens (forwarded from
Sample(AlphabetContainer, AbstractStringExtractor, String, int)
)Sample(AlphabetContainer, AbstractStringExtractor, String,
int)
public Sample(AlphabetContainer abc, AbstractStringExtractor se, int subsequenceLength) throws WrongAlphabetException, WrongLengthException, EmptySampleException
Sample
from a StringExtractor
using the given AlphabetContainer
and all overlapping windows of
length subsequenceLength
.
abc
- the AlphabetContainer
se
- the StringExtractor
subsequenceLength
- the length of the window sliding on the String
of
se
, if len
is 0 (zero) then the
Sequence
s are used as given from the
StringExtractor
WrongAlphabetException
- if the AlphabetContainer
is not suitable
WrongLengthException
- if the subsequence length is not supported
EmptySampleException
- if the Sample
would be emptySample(AlphabetContainer, AbstractStringExtractor, String,
int)
public Sample(AlphabetContainer abc, AbstractStringExtractor se, String delim) throws WrongAlphabetException, EmptySampleException, WrongLengthException
Sample
from a StringExtractor
using the given AlphabetContainer
and a delimiter
delim
.
abc
- the AlphabetContainer
se
- the StringExtractor
delim
- the delimiter for parsing the String
s
WrongAlphabetException
- if the AlphabetContainer
is not suitable
EmptySampleException
- if the Sample
would be empty
WrongLengthException
- never happens (forwarded from
Sample(AlphabetContainer, AbstractStringExtractor, String, int)
)Sample(AlphabetContainer, AbstractStringExtractor, String,
int)
public Sample(AlphabetContainer abc, AbstractStringExtractor se, String delim, int subsequenceLength) throws EmptySampleException, WrongAlphabetException, WrongLengthException
Sample
from a StringExtractor
using the given AlphabetContainer
, the given delimiter
delim
and all overlapping windows of length
subsequenceLength
.
abc
- the AlphabetContainer
se
- the StringExtractor
delim
- the delimiter for parsing the String
ssubsequenceLength
- the length of the window sliding on the String
of
se
, if len
is 0 (zero) then the
Sequence
s are used as given from the
StringExtractor
WrongAlphabetException
- if the AlphabetContainer
is not suitable
EmptySampleException
- if the Sample
would be empty
WrongLengthException
- if the subsequence length is not supportedpublic Sample(Sample s, int subsequenceLength) throws WrongLengthException
Sample
from a given Sample
and a given
length subsequenceLength
.Sample
.
getElementAt(int)
are real objects and do not have to be created
at the invocation of the method. (The same holds for the
Sample.ElementEnumerator
. In those cases both ways to access the
Sequence
are approximately equally fast.)
s
- the given Sample
subsequenceLength
- the new element length
WrongLengthException
- if something is wrong with subsequenceLength
public Sample(String annotation, Sequence... seqs) throws EmptySampleException, WrongAlphabetException
Sample
from an array of Sequence
s and a
given annotation.Model.emitSample(int, int...)
.
annotation
- the annotation of the Sample
seqs
- the Sequence
(s)
EmptySampleException
- if the array seqs
is null
or the
length is 0
WrongAlphabetException
- if the AlphabetContainer
s do not matchMethod Detail |
---|
public static final String getAnnotation(Sample... s)
Sample
s.
s
- an array of Sample
s
getAnnotation()
public static final Sample diff(Sample data, Sample... samples) throws EmptySampleException, WrongAlphabetException
Sample
data
and
the Sample
s samples
.
data
- the minuendsamples
- the subtrahends
WrongAlphabetException
- if the AlphabetContainer
s do not match, i.e., if the Samples are from different domains
EmptySampleException
- if the difference is emptypublic static final Sample intersection(Sample... samples) throws IllegalArgumentException, EmptySampleException
Sample
s of the array, i.e. it returns a Sample
containing only
Sequence
s that are contained in all Sample
s of the array.
samples
- the array of Sample
s
Sample
s in the array
IllegalArgumentException
- if the elements of the array are from different domains
EmptySampleException
- if the intersection is emptypublic static final Sample union(Sample[] s, boolean[] in) throws IllegalArgumentException, EmptySampleException
Sample
s of the array s
regarding the array in
.
s
- the array of Sample
sin
- an array indicating which Sample
is used in the union,
if in[i]==true
the Sample
s[i]
is used
Sample
IllegalArgumentException
- if s.length != in.length
or the Alphabet
s do not match
EmptySampleException
- if the union is emptyunion(Sample[], boolean[], int)
public static final Sample union(Sample... s) throws IllegalArgumentException
Sample
s of the array s
.
s
- the array of Sample
s
Sample
IllegalArgumentException
- if the Alphabet
s do not matchunion(Sample[], boolean[])
public static final Sample union(Sample[] s, boolean[] in, int subsequenceLength) throws IllegalArgumentException, EmptySampleException, WrongLengthException
Sample
s of the array s
regarding the array in
and sets the element length in the
united Sample
to subsequenceLength
.
s
- the array of Sample
sin
- an array indicating which Sample
is used in the union,
if in[i]==true
the Sample
s[i]
is usedsubsequenceLength
- the length of the elements in the united Sample
Sample
IllegalArgumentException
- if s.length != in.length
or the Alphabet
s do not match
EmptySampleException
- if the union is empty
WrongLengthException
- if the united Sample
does not support this
subsequenceLength
public static final Sample union(Sample[] s, int subsequenceLength) throws IllegalArgumentException, WrongLengthException
Sample
s of the array s
and
sets the element length in the united sample to
subsequenceLength
.
s
- the array of Sample
ssubsequenceLength
- the length of the elements in the united Sample
Sample
IllegalArgumentException
- if the Alphabet
s do not match
WrongLengthException
- if the united Sample
does not support this
subsequenceLength
union(Sample[], boolean[], int)
public Sequence[] getAllElements()
Sequence
s containing all elements of this
Sample
.
Sequence
s) of this Sample
Sample.ElementEnumerator
public final AlphabetContainer getAlphabetContainer()
AlphabetContainer
of this Sample
.
AlphabetContainer
of this Sample
public final String getAnnotation()
Sample
.
Sample
public final Sample getCompositeSample(int[] starts, int[] lengths) throws IllegalArgumentException
Sequence
s of all
elements in the current Sample
. Each composite Sequence
will be build from one corresponding Sequence
in this
Sample
and all composite Sequence
s
will be returned in a new Sample
.
starts
- the start positions of the chunkslengths
- the lengths of the chunks
Sample
IllegalArgumentException
- if either starts
or lengths
or both
in combination are not suitableSequence.getCompositeSequence(AlphabetContainer, int[], int[])
public Sequence getElementAt(int i)
Sequence
, with index
i
. See also this
comment.
i
- the index of the element, i.e. the Sequence
Sequence
, with index i
public int getElementLength()
Sequence
s, in this
Sample
.
Sequence
s, in this
Sample
public final Sample getInfixSample(int start, int length) throws IllegalArgumentException
Sequence
s, in the current Sample
. The subsequences will
be returned in an new Sample
.
Sample
of prefixes if
the element length is not zero.
start
- the start position of the infixlength
- the length of the infix, has to be positive
Sample
of the specified infixes
IllegalArgumentException
- if either start
or length
or both
in combination are not suitablepublic Sample getReverseComplementarySample() throws OperationNotSupportedException
Sample
that contains the reverse complement of all Sequence
s in
this Sample
.
OperationNotSupportedException
- if the AlphabetContainer
of any of the Sequence
s in this Sample
is not complementablepublic int getMinimalElementLength()
Sequence
, in
this Sample
.
Sequence
, in
this Sample
public int getMaximalElementLength()
Sequence
, in
this Sample
.
Sequence
, in
this Sample
public int getNumberOfElements()
Sequence
s, in this
Sample
.
Sequence
s, in this
Sample
public Iterator<Sequence> iterator()
iterator
in interface Iterable<Sequence>
public int getNumberOfElementsWithLength(int len) throws WrongLengthException
len
- the length of the elements
WrongLengthException
- if the given length is bigger than the minimal element lengthgetNumberOfElementsWithLength(int, double[])
public double getNumberOfElementsWithLength(int len, double[] weights) throws WrongLengthException, IllegalArgumentException
len
- the length of the elementsweights
- the weights of each element of the sample (see getElementAt(int)
), can be null
WrongLengthException
- if the given length is bigger than the minimal element length
IllegalArgumentException
- if the weights have a wrong dimensionpublic final Sample getSuffixSample(int start) throws IllegalArgumentException
Sequence
, in the current Sample
. The subsequences will be
returned in an new Sample
.
start
- the start position of the suffix
Sample
of specified suffixes
IllegalArgumentException
- if start
is not suitablepublic final boolean isSimpleSample()
true
if the Sample
is simple,
false
otherwiseAlphabetContainer.isSimple()
public final boolean isDiscreteSample()
true
if the Sample
is discrete,
false
otherwiseAlphabetContainer.isDiscrete()
public Sample[] partition(double p, Sample.PartitionMethod method, int subsequenceLength) throws WrongLengthException, UnsupportedOperationException, EmptySampleException
Sequence
s, of the
Sample
in two distinct parts. The second part (test sample) holds
the percentage of p
, the first the rest (train sample). The
first part has element length as the current Sample
, the second
has element length subsequenceLength
, which might be
necessary for testing.
p
- the percentage for the second part, the second part holds at
least this percentage of the full Sample
method
- the method how to partition the sample (partitioning
criterion)subsequenceLength
- the element length of the second part, if 0 (zero) then the
sequences are used as given in this Sample
Sample
s
WrongLengthException
- if something is wrong with subsequenceLength
UnsupportedOperationException
- if the Sample
is not simple
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
,
partition(PartitionMethod, double...)
public Sample[] partition(Sample.PartitionMethod method, double... percentage) throws IllegalArgumentException, EmptySampleException
Sequence
s, of the
Sample
in distinct parts where each part holds the corresponding
percentage given in the array percentage
.
method
- the method how to partition the Sample
(partitioning
criterion)percentage
- the array of percentages for each "subsample"
Sample
s
IllegalArgumentException
- if something with the percentages is not correct (
sum != 1
or one value is not in
[0,1]
)
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Pair<Sample[],double[][]> partition(double[] sequenceWeights, Sample.PartitionMethod method, double... percentage) throws IllegalArgumentException, EmptySampleException
Sequence
s, of the
Sample
and the corresponding weights in distinct parts where each part holds the corresponding
percentage given in the array percentage
.
sequenceWeights
- the weights for the sequences (might be null
)method
- the method how to partition the Sample
(partitioning
criterion)percentage
- the array of percentages for each "subsample"
Pair
containing an array of partitioned Sample
s and an array of partitioned sequence weights
IllegalArgumentException
- if something with the percentages is not correct (
sum != 1
or one value is not in
[0,1]
)
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample[] partition(int k, Sample.PartitionMethod method) throws IllegalArgumentException, EmptySampleException
Sequence
s, of the
Sample
in k
distinct parts.
k
- the number of distinct partsmethod
- the method how to partition the Sample
(partitioning
criterion)
Sample
s
IllegalArgumentException
- if k
is not correct
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Pair<Sample[],double[][]> partition(double[] sequenceWeights, int k, Sample.PartitionMethod method) throws IllegalArgumentException, EmptySampleException
Sequence
s, of the
Sample
and the corresponding weights in k
distinct parts.
sequenceWeights
- the weights for the sequences (might be null
)k
- the number of distinct partsmethod
- the method how to partition the Sample
(partitioning
criterion)
Pair
containing an array of partitioned Sample
s and an array of partitioned sequence weights
IllegalArgumentException
- if k
is not correct
EmptySampleException
- if at least one of the created partitions is emptySample.PartitionMethod
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_ELEMENTS
,
Sample.PartitionMethod.PARTITION_BY_NUMBER_OF_SYMBOLS
public Sample subSampling(int number) throws EmptySampleException
Sequence
s, from the set of all
elements, i.e. the Sequence
s, contained in this Sample
. Sample
is chosen to contain overlapping
elements (windows of length subsequenceLength
) or not, those
elements (overlapping windows, whole sequences) are subsampled.
number
- the number of Sequence
s that should be drawn from the
contained set of Sequence
s (with replacement)
Sample
containing the drawn Sequence
s
EmptySampleException
- if number
is not positivepublic final void save(File f) throws IOException
Sample
to a file f
.
f
- the File
IOException
- if something went wrong with the filesave(OutputStream, char, SequenceAnnotationParser)
public final void save(OutputStream stream, char commentChar, SequenceAnnotationParser p) throws IOException
Sequence
s including their
SequenceAnnotation
s into a OutputStream
. The
SequenceAnnotation
s are parsed using the
SequenceAnnotationParser
.
stream
- the stream which is used to write the Sample
commentChar
- the character that marks comment linesp
- the parser for the SequenceAnnotation
s of the
Sequence
s
IOException
- if something went wrong while writing into the stream.SequenceAnnotationParser.parseAnnotationToComment(char,
SequenceAnnotation...)
public String toString()
toString
in class Object
public Hashtable<String,HashSet<String>> getAnnotationTypesAndIdentifier()
SequenceAnnotation
types and the corresponding
identifier which occur in this Sample
.
Hashtable
with key = SequenceAnnotation
type and identifier = SequenceAnnotation
identifierSequenceAnnotation
public int[][] getSequenceAnnotationIndexMatrix(String rowType, Hashtable<String,Integer> rowHash, String columnType, Hashtable<String,Integer> columnHash)
Sequence
with specific SequenceAnnotation
combination or -1 if the Sample
does not contain any Sequence
with such a combination. The rows and
columns are indexed according to the Hashtable
s.
int[][] matrix = s.getSequenceAnnotationIndexMatrix( rowType, rowHash, columnType, columnHash )
if( matrix[i][j] < 0 ) {
System.out.println( "There is no Sequence in the Sample with this SequenceAnnotation combination");
} else {
System.out.println( "This is the Sequence: " + s.getElementAt( matrix[i][j] ) );
}
rowType
- the SequenceAnnotation
type for the rowsrowHash
- a Hashtable
of SequenceAnnotation
identifier and indices for the rowscolumnType
- the SequenceAnnotation
type for the columnscolumnHash
- a Hashtable
of SequenceAnnotation
identifier and indices for the columns
Sequence
s with each specific combination of
SequenceAnnotation
for code>rowType and columnType
and -1
if this combination does not exist in the Sample
getAnnotationTypesAndIdentifier()
,
ToolBox.parseHashSet2IndexHashtable(HashSet)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |