PCTLearn
by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.
Runnable JAR
PCTLearn requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. The number of different characters in the input file determines the alphabet size for PCT optimization. The application has one mandatory and various optional arguments. A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values. Run with
java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth
where the arguments have the following semantics:
name | type | default | comment |
inputFile | String | -- | The location of a text file containing the input data. |
maximalDepth | Integer | 2 | The maximal depth of the learned PCT. |
scoringFunction | String | BIC | The used scoring function. Permitted values are "BIC" and "AIC". |
memoization | Boolean | TRUE | Enabling memoization. |
pruning | Boolean | TRUE | Enabling pruning. |
fineBound | Boolean | TRUE | Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE. |
memoLimit | Integer | 1 | Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE. |
lookaheadDepth | Integer | 1 | The used lookahead depth. Is ignored if pruning is set to FALSE. |
The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout.
It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.