PCTLearn: Difference between revisions

From Jstacs
Jump to navigationJump to search
(page created)
 
No edit summary
Line 1: Line 1:
by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.
by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.
== Runnable JAR ==
[https://www.cs.helsinki.fi/u/eggeling/PCTLearn/PCTLearn.jar PCTLearn] requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. The number of different characters in the input file determines the alphabet size for PCT optimization.
The application has one mandatory and various optional arguments.
A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values.
Run with
<code>java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth</code>
where the arguments have the following semantics:
<table border=0 cellpadding=10 align="center">
<tr>
<td>name</td>
<td>type</td>
        <td>default</td>
<td>comment</td>
</tr>
<tr><td colspan=4><hr></td></tr>
<tr>
<td><font color="green">inputFile</font></td>
<td>String</td>
<td>--</td>
<td>The location of a text file containing the input data. </td>
</tr>
<tr>
<td><font color="green">maximalDepth</font></td>
<td>Integer</td>
        <td>2</td>
<td>The maximal depth of the learned PCT.</td>
</tr>
<tr>
<td><font color="green">scoringFunction</font></td>
<td>String</td>
        <td>BIC</td>
<td>The used scoring function. Permitted values are "BIC" and "AIC".</td>
</tr>
<tr>
<td><font color="green">memoization</font></td>
<td>Boolean</td>
        <td>TRUE</td>
<td>Enabling memoization.</td>
</tr>
<tr>
<td><font color="green">pruning</font></td>
<td>Boolean</td>
        <td>TRUE</td>
<td>Enabling pruning.</td>
</tr>
<tr>
<td><font color="green">fineBound</font></td>
<td>Boolean</td>
        <td>TRUE</td>
<td>Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.</td>
</tr>
<tr>
<td><font color="green">memoLimit</font></td>
<td>Integer</td>
        <td>1</td>
<td>Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.</td>
</tr>
<tr>
<td><font color="green">lookaheadDepth</font></td>
<td>Integer</td>
        <td>1</td>
<td>The used lookahead depth. Is ignored if pruning is set to FALSE.</td>
</tr>
</table>
The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout.
It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.

Revision as of 11:33, 14 December 2017

by Ralf Eggeling, Ivo Grosse, and Mikko Koivisto.

Runnable JAR

PCTLearn requires as input a plain text file, which can consist of basic Latin characters a-z and A-Z (case sensitive) and Arabic numerical 0-9. The number of different characters in the input file determines the alphabet size for PCT optimization. The application has one mandatory and various optional arguments. A shorter list of arguments can be provided, in which case all missing arguments are considered to assume default values. Run with

java -jar PCTLearn.jar inputFile maximalDepth scoringFunction memoization pruning fineBound memoLimit lookaheadDepth

where the arguments have the following semantics:

name type default comment

inputFile String -- The location of a text file containing the input data.
maximalDepth Integer 2 The maximal depth of the learned PCT.
scoringFunction String BIC The used scoring function. Permitted values are "BIC" and "AIC".
memoization Boolean TRUE Enabling memoization.
pruning Boolean TRUE Enabling pruning.
fineBound Boolean TRUE Use fine upper bound instead of coarse. Is ignored if pruning is set to FALSE.
memoLimit Integer 1 Memoization limit that stops storing subtrees width given distance from the leaves. Is ignored if memoization is set to FALSE.
lookaheadDepth Integer 1 The used lookahead depth. Is ignored if pruning is set to FALSE.

The tool writes some statistics about the optimization, such optimal score, number of visited node, and required running time to stdout.

It addition it creates (i) a graphViz file of the learned PCT structure and (ii) a file with conditional probability parameters (MLE) for each leaf.