Dimont: Difference between revisions
No edit summary |
No edit summary |
||
Line 34: | Line 34: | ||
<tr> | <tr> | ||
<td><font color="green">data</font></td> | <td><font color="green">data</font></td> | ||
<td>Input file (The file name of the file containing the input sequences in annotated FastA format (see | <td>Input file (The file name of the file containing the input sequences in annotated FastA format (see below))</td> | ||
<td>String</td> | <td>String</td> | ||
</tr> | </tr> | ||
Line 43: | Line 43: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">position</font></td> | ||
<td>Position tag (The tag for the position information in the FastA-annotation of the input file)</td> | <td>Position tag (The tag for the position information in the FastA-annotation of the input file)</td> | ||
<td>String</td> | <td>String</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">value</font></td> | ||
<td>Value tag (The tag for the value information in the FastA-annotation of the input file)</td> | <td>Value tag (The tag for the value information in the FastA-annotation of the input file)</td> | ||
<td>String</td> | <td>String</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">weightingFactor</font></td> | ||
<td>Weighting factor (The value for weighting the data; either a value between 0 and 1, or a description relative to the standard deviation (e.g. +4sd), default = 0.2)</td> | <td>Weighting factor (The value for weighting the data; either a value between 0 and 1, or a description relative to the standard deviation (e.g. +4sd), default = 0.2)</td> | ||
<td>Double</td> | <td>Double</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">starts</font></td> | ||
<td>Starts (The number of pre-optimization runs., valid range = [1, 100], default = 20)</td> | <td>Starts (The number of pre-optimization runs., valid range = [1, 100], default = 20)</td> | ||
<td>Integer</td> | <td>Integer</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">motifWidth</font></td> | ||
<td>Initial motif width (The motif width that is used initially, may be adjusted during optimization., valid range = [1, 50], default = 15)</td> | <td>Initial motif width (The motif width that is used initially, may be adjusted during optimization., valid range = [1, 50], default = 15)</td> | ||
<td>Integer</td> | <td>Integer</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">motifOrder</font></td> | ||
<td>Markov order of motif model (The Markov order of the model for the motif., valid range = [0, 3], default = 0)</td> | <td>Markov order of motif model (The Markov order of the model for the motif., valid range = [0, 3], default = 0)</td> | ||
<td>Integer</td> | <td>Integer</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td><font color="green"> | <td><font color="green">bgOrder</font></td> | ||
<td>Markov order of background model (The Markov order of the model for the background sequence and the background sequence, -1 defines uniform distribution., valid range = [-1, 5], default = -1)</td> | <td>Markov order of background model (The Markov order of the model for the background sequence and the background sequence, -1 defines uniform distribution., valid range = [-1, 5], default = -1)</td> | ||
<td>Integer</td> | <td>Integer</td> | ||
Line 94: | Line 94: | ||
</table> | </table> | ||
Input sequences must be supplied in an annotated FastA format. In the annotation of each sequence, you need to provide a value that reflects the confidence that this sequence is bound by the factor of interest. Such confidences may be peak statistics (e.g., number of fragments under a peak) for ChIP data or signal intensities for PBM data. In addition, you need to provide an anchor position within the sequence. In case of ChIP data, this anchor position could for instance be the peak summit. For instance, an annotated FastA file for ChIP-exo data comprising sequences of length 100 centered around the peak summit could look like: | |||
> peak: 50; signal: 515 | |||
ggccatgtgtatttttttaaatttccac... | |||
> peak: 50; signal: 199 | |||
GGTCCCCTGGGAGGATGGGGACGTGCTG... | |||
... | |||
where the anchor point is given as 50 for the first two sequences, and the confidence amounts to 515 and 199, respectively. The FastA comment may contain additional annotations of the format <code>key1 : value1; key2: value2;....</code> | |||
Accordingly, you would need to set the parameter "Position tag" to peak and the parameter "Value tag" to signal for the input file. | |||
For the initial motif length and the number of pre-optimization runs, we provide default values that worked well in our studies on ChIP and PBM data. However, you may want adjust these parameters to meet your prior information. | |||
The parameter "Markov order of the motif model" sets the order of the inhomogeneous Markov model used for modeling the motif. If this parameter is set to 0, you obtain a position weight matrix (PWM) model. If it is set to 1, you obtain a weight array matrix (WAM) model. You can set the order of the motif model to at most 3. | |||
The parameter "Markov order of the background model" sets the order of the homogeneous Markov model used for modeling positions not covered by a motif. If this parameter is set to -1, you obtain a uniform distribution, which worked well for ChIP data. For PBM data, orders of up to 4 resulted in an increased prediction performance in our case studies. The maximum allowed value is 5. | |||
The parameter "Weighting factor" defines the proportion of sequences that you expect to be bound by the targeted factor with high confidence. For ChIP data, the default value of 0.2 typically works well. For PBM data, containing a large number of unspecific probes, this parameter should be set to a lower value, e.g. 0.01. | |||
The "Equivalent sample size" reflects the strength of the influence of the prior on the model parameters, where higher values smooth out the parameters to a greater extent. | |||
The parameter "Delete BSs from profile" defines if BSs of already discovered motifs should be deleted, i.e., "blanked out", from the sequence before searching for futher motifs. | |||
== Installing the web-application == | == Installing the web-application == |
Revision as of 11:29, 5 January 2013
by Jan Grau, Stefan Posch, Ivo Grosse, and Jens Keilwagen
ChIPper is a universal tool for de-novo motif discovery. ChIPper has successfully been applied to ChIP-seq, ChIP-exo and protein-binding microarray (PBM) data.
We provide ChIPper as a public web-server, a web-application that can be installed in a local Galaxy server, and as a command line program.
ChIPper web-server
ChIPper is available as a public web-server at galaxy.informatik.uni-halle.de.
Download
ChIPper is implemented in Java using Jstacs. You can download the command line application as a Jar. In addition, we provide the Jar of the Galaxy web-application for installing it in your local Galaxy server.
ChIPper will be part of the next public release of the Jstacs library.
Running the command line application
For running the command line application, Java v1.6 or later is required.
The arguments of the command line application have the following meaning:
name | comment | type |
home | Home directory (The path to the directory containing the input file. Output files are written to this directory as well., default = ./) | String |
data | Input file (The file name of the file containing the input sequences in annotated FastA format (see below)) | String |
infix | Infix (a infix to be used for all output files (model, sequence logos, predicted binding sites)) | String |
position | Position tag (The tag for the position information in the FastA-annotation of the input file) | String |
value | Value tag (The tag for the value information in the FastA-annotation of the input file) | String |
weightingFactor | Weighting factor (The value for weighting the data; either a value between 0 and 1, or a description relative to the standard deviation (e.g. +4sd), default = 0.2) | Double |
starts | Starts (The number of pre-optimization runs., valid range = [1, 100], default = 20) | Integer |
motifWidth | Initial motif width (The motif width that is used initially, may be adjusted during optimization., valid range = [1, 50], default = 15) | Integer |
motifOrder | Markov order of motif model (The Markov order of the model for the motif., valid range = [0, 3], default = 0) | Integer |
bgOrder | Markov order of background model (The Markov order of the model for the background sequence and the background sequence, -1 defines uniform distribution., valid range = [-1, 5], default = -1) | Integer |
ess | Equivalent sample size (Reflects the strength of the prior on the model parameters., valid range = [0.0, Infinity], default = 4.0) | Double |
delete | Delete BSs from profile (A switch for deleting binding site positions of discovered motifs from the profile before searching for futher motifs., default = true) | Boolean |
threads | Compute threads (The number of threads that are use to evaluate the objective function and its gradient., valid range = [1, 128], OPTIONAL) | Integer |
Input sequences must be supplied in an annotated FastA format. In the annotation of each sequence, you need to provide a value that reflects the confidence that this sequence is bound by the factor of interest. Such confidences may be peak statistics (e.g., number of fragments under a peak) for ChIP data or signal intensities for PBM data. In addition, you need to provide an anchor position within the sequence. In case of ChIP data, this anchor position could for instance be the peak summit. For instance, an annotated FastA file for ChIP-exo data comprising sequences of length 100 centered around the peak summit could look like:
> peak: 50; signal: 515 ggccatgtgtatttttttaaatttccac... > peak: 50; signal: 199 GGTCCCCTGGGAGGATGGGGACGTGCTG... ...
where the anchor point is given as 50 for the first two sequences, and the confidence amounts to 515 and 199, respectively. The FastA comment may contain additional annotations of the format key1 : value1; key2: value2;....
Accordingly, you would need to set the parameter "Position tag" to peak and the parameter "Value tag" to signal for the input file.
For the initial motif length and the number of pre-optimization runs, we provide default values that worked well in our studies on ChIP and PBM data. However, you may want adjust these parameters to meet your prior information.
The parameter "Markov order of the motif model" sets the order of the inhomogeneous Markov model used for modeling the motif. If this parameter is set to 0, you obtain a position weight matrix (PWM) model. If it is set to 1, you obtain a weight array matrix (WAM) model. You can set the order of the motif model to at most 3.
The parameter "Markov order of the background model" sets the order of the homogeneous Markov model used for modeling positions not covered by a motif. If this parameter is set to -1, you obtain a uniform distribution, which worked well for ChIP data. For PBM data, orders of up to 4 resulted in an increased prediction performance in our case studies. The maximum allowed value is 5.
The parameter "Weighting factor" defines the proportion of sequences that you expect to be bound by the targeted factor with high confidence. For ChIP data, the default value of 0.2 typically works well. For PBM data, containing a large number of unspecific probes, this parameter should be set to a lower value, e.g. 0.01.
The "Equivalent sample size" reflects the strength of the influence of the prior on the model parameters, where higher values smooth out the parameters to a greater extent.
The parameter "Delete BSs from profile" defines if BSs of already discovered motifs should be deleted, i.e., "blanked out", from the sequence before searching for futher motifs.
Installing the web-application
The command-line program behind the web-application is a Jar as well, so Java is required on the server running Galaxy.
To install this command line program in Galaxy, copy it to the desired destination in the Galaxy tools
directory.
The command line application writes its Galaxy tool definition file itself. If you are in the directory containing the command-line program for Galaxy, you can create the tool definition file by calling
java -jar ChIPperWeb.jar --create ChIPperWeb.xml
Afterwards, this directory contains the tool definition file ChIPperWeb.xml
. Now you can register ChIPper in the Galaxy tool_conf.xml
file. For details, see the Galaxy tutorial for adding new tools.