Training a classifier and classifying new sequences
From Jstacs
Jump to navigationJump to search
//create a DNA-alphabet
AlphabetContainer container = new AlphabetContainer( new DNAAlphabet() );
//the length of our input sequences
int length = 7;
//create a Sample for each class from the input data, using the alphabet from above
Sample[] data = new Sample[]{ new Sample( container, new StringExtractor( new File(args[0]), 100) ),
new Sample( container, new StringExtractor( new File(args[1]), 100 ), length ) };
//sequences that will be classified
Sample toClassify = new Sample(container, new StringExtractor( new File(args[2]), 100 ) );
//create a new PWM
BayesianNetworkModel pwm = new BayesianNetworkModel( new BayesianNetworkModelParameterSet(
//the alphabet and the length of the model:
container, length,
//the equivalent sample size to compute hyper-parameters
4,
//some identifier for the model
"my PWM",
//we want a PWM, which is an inhomogeneous Markov model (IMM) of order 0
ModelType.IMM, (byte) 0,
//we want to estimate the MAP-parameters
LearningType.ML_OR_MAP ) );
//create a classifier with a PWM in the foreground and a PWM in the background
ModelBasedClassifier classifier = new ModelBasedClassifier( pwm, pwm );
//train the classifier
classifier.train( data );
//use the trained classifier to classify new sequences
byte[] result = classifier.classify( toClassify );
System.out.println( Arrays.toString( result ) );