PCTLearn

From Jstacs
Revision as of 12:33, 14 December 2017 by Eggeling (talk | contribs)
Jump to navigationJump to search

by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.

Runnable JAR

PCTLearn requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. The number of different characters in the input file determines the alphabet size for PCT optimization. The application has one mandatory and various optional arguments. A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values. Run with

java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth

where the arguments have the following semantics:

name type default comment

inputFile String -- The location of a text file containing the input data.
maximalDepth Integer 2 The maximal depth of the learned PCT.
scoringFunction String BIC The used scoring function. Permitted values are "BIC" and "AIC".
memoization Boolean TRUE Enabling memoization.
pruning Boolean TRUE Enabling pruning.
fineBound Boolean TRUE Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.
memoLimit Integer 1 Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.
lookaheadDepth Integer 1 The used lookahead depth. Is ignored if pruning is set to FALSE.

The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout.

It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.