Projects: Difference between revisions

From Jstacs
Jump to navigationJump to search
No edit summary
No edit summary
 
(35 intermediate revisions by 4 users not shown)
Line 1: Line 1:
This site contains projects that use Jstacs.
This site contains projects that use Jstacs.
 
* [[MotifAdjuster]]: a tool for computational reassessment of transcription factor binding site annotations
= MotifAdjuster =
* [[Prior]]: apples and oranges: avoiding different priors in Bayesian DNA sequence analysis
 
* [[GenDisMix]]: unifying generative and discriminative learning principles
by Jens Keilwagen, Jan Baumbach, Thomas Kohl and Ivo Grosse.
* [[Dispom]]: de-novo discovery of differentially abundant transcription factor binding sites including their positional preference
 
* [[MiMB]]: probabilistic approaches to transcription factor binding site prediction
== Description ==
* [[SHMM]]: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data
Valuable binding site annotation data are stored in databases. However, several types of errors can, and do, occur in the process of manually incorporating annotation data from scientific literature into these databases. Here, we introduce MotifAdjuster, a software that helps to detect these errors, and we demonstrate its efficacy on public data sets.
* [[DSHMM]]: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles
 
* [[PHHMM]]: improved analysis of Array-CGH data
== Paper ==
* [[MeDIP-HMM]]: HMM-based analysis of DNA methylation profiles
The paper '''''MotifAdjuster: A tool for computational reassessment of transcription factor binding site annotations''''' has been submitted to [http://genomebiology.com/software/ Genome Biology].
* [[ARHMM]]: integrating local chromosomal dependencies into the analysis of tumor expression profiles
 
* [[FlowCap]]: molecular classification of acute myeloid leukaemia (AML) using flow cytometry data
== Download ==
* [[TALgetter]]: prediction of TAL effector target sites
MotifAdjuster can be downloaded [http://www.jstacs.de/downloads/MotifAdjuster.zip here].
* [[TALENoffer]]: genome-wide TALEN off-target prediction
 
* [[Dimont]]: general approach for discriminative de-novo motif discovery from high-throughput data
== Start instructions ==
* [[AUC-PR]]: area under ROC and PR curves for weighted and unweighted data
 
* [[Slim]]: Sparse local inhomogeneous mixture (Slim) models and dependency logos
If you have unzipped the archive, you can start the MotifAdjuster by invoking
* [[PMMdeNovo]]: de novo motif discovery based on inhomogeneous parsimonious Markov models (PMMs) for exploiting intra-motif dependencies
 
* [[AnnoTALE]]: identifying and analysing TALEs in ''Xanthomonas'' genomes, for clustering TALEs, for assigning novel TALEs to existing classes, for proposing TALE names using a unified nomenclature, and for predicting TALE targets
<p><code>java -cp ./:./jstacs-1.2.2.jar:./numericalMethods.jar MotifAdjuster <font color="green">&lt;file&gt; &lt;ignoreChar&gt; &lt;length&gt; &lt;fgOrder&gt; &lt;fgEss&gt; &lt;bothStrands&gt; &lt;output&gt; &lt;sigma&gt; &lt;p(no motif)&gt;</font></code></p>
* [[GeMoMa]]: Gene Model Mapper (GeMoMa) is a homology-based gene prediction program that uses the annotation of protein-coding genes in a reference genome to infer annotation of protein-coding genes in a target genome
 
* [[InMoDe]]: tools for learning and visualizing intra-motif dependencies of DNA binding sites
In Windows, you have to use &quot;;&quot; instead of &quot;:&quot; in the class path.
* [[Disentangler]]: two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.
 
* [[PCTLearn]]: efficient learning of parsimonious context trees from sequence data.
The arguments have the following meaning
* [[Catchitt]]: collection of tools for predicting cell type-specific binding regions of transcription factors
 
* [[PrediTALE]]: predict TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE
<table border=0 cellpadding=10 align="center">
<tr>
<td>name</td>
<td>comment</td>
<td>type</td>
</tr>
<tr><td colspan=3><hr></td></tr>
<tr>
<td><font color="green">file</font></td>
<td>the location of the data set</td>
<td>String</td>
</tr>
<tr>
<td><font color="green">ignoreChar</font></td>
<td>char for comment lines (e.g. for a FastA-file '&gt;')</td>
<td>char</td>
</tr>
<tr>
<td><font color="green">length</font></td>
<td>the motif length</td>
<td>int</td>
</tr>
<tr>
<td><font color="green">fgOrder</font></td>
<td>the order of the inhomogeneous Markov model that is uses for the motif; 0 yields in a PWM</td>
<td>byte</td>
</tr>
<tr>
<td><font color="green">ess</font></td>
<td>the equivalent sample size that is used for the mixture model</td>
<td>double &gt;= 0</td>
</tr>
<tr>
<td><font color="green">bothStrands</font></td>
<td>use both strands</td>
<td>boolean</td>
</tr>
<tr>
<td><font color="green">output</font></td>
<td>output of the EM</td>
<td>boolean</td></tr>
<tr>
<td><font color="green">sigma</font></td>
<td>the sigma of the truncated discrete Gaussian distribution</td>
<td>double&gt;0</td>
</tr>
<tr>
<td><font color="green">p(no motif)</font></td>
<td>the probability for finding no motif</td>
<td>0&lt;=double&lt;1</td>
</tr>
</table>
 
= DiPoMM =
 
by Jens Keilwagen, Jan Grau, Stefan Posch, Marc Strickert and Ivo Grosse.
 
== Description ==
Transcription factors are one main component of gene regulation, as they activate or repress gene expression by binding to their binding sites. The de-novo discovery of transcription factor binding sites in the promoters of target genes is a challenging problem in bioinformatics, which has not yet been solved satisfactorily.
We present DiPoMM, a discriminative de-novo motif discovery tool that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process.
 
== Paper ==
The paper '''''DiPoMM: Discriminative de-novo motif discovery utilizing positional preference''''' has been submitted to [http://www.iscb.org/ismbeccb2009/ ISMB 2009].
 
== Download ==
DiPoMM can be downloaded [http://www.jstacs.de/downloads/DiPoMM.zip here].
 
== Start instructions ==
Once you have unzipped the archive, you can start DiPoMM e.g. by invoking
 
<code>java -cp .:jstacs-1.2.2.jar:lib/numericalMethods.jar:lib/bytecode.jar:lib/biojava-live.jar projects.DiPoMM home=path/to/data/directory/ fg=fgfile.txt bg=bgfile.txt init=best-random=100 p-val=1E-4</code>
 
to search for motifs that are over-represented in <code>path/to/data/directory/fgfile.txt</code> but not in <code>path/to/data/directory/bgfile.txt</code>, initialize DiPoMM with the best from 100 randomly drawn starting values, and search for motif occurrences with a p-value less than <code>1E-4</code>.
 
Under Windows, you must use &quot;;&quot; instead of &quot;:&quot; in the class path.
 
The arguments have the following meaning
<table border=0 cellpadding=10 align="center">
<tr>
<td>name</td>
<td>comment</td>
<td>type</td>
</tr>
<tr><td colspan=3><hr></td></tr>
<tr>
<td><font color="green">home</font></td>
<td>the path to the data directory, default = ./</td>
<td>String</td>
</tr>
<tr>
<td><font color="green">ignore</font></td>
<td>the char that is used to mask comment lines in data files, e.g., '>' in a FASTA-file, default = ></td>
<td>Character</td>
</tr>
<tr>
<td><font color="green">fg</font></td>
<td>the file name of the foreground data file (the file containing sequences which are expected to contain binding sites of a common motif)</td>
<td>String</td>
</tr>
<tr>
<td><font color="green">bg</font></td>
<td>the file name of the background data file</td>
<td>String</td>
</tr>
<tr>
<td><font color="green">length</font></td>
<td>the motif length that is used at the beginning, valid range = [1, 50], default = 15</td>
<td>Integer</td>
</tr>
<tr>
<td><font color="green">flankOrder</font></td>
<td>The Markov order of the model for the flanking sequence and the background sequence, valid range = [0, 5], default = 0</td>
<td>Integer</td>
</tr>
<tr>
<td><font color="green">motifOrder</font></td>
<td>The Markov order of the motif model, valid range = [0, 3], default = 0</td>
<td>Integer</td>
</tr>
<tr>
<td><font color="green">bothStrands</font></td>
<td>a switch whether to use both strands or not, default = true</td>
<td>Boolean</td>
</tr>
<tr>
<td><font color="green">init</font></td>
<td>the method that is used for initialization, one of 'best-random=<number>', 'enum=<length>', and 'specific=<sequence or file of sequence>'</td>
<td>String=[Integer | String]</td>
</tr>
<tr>
<td><font color="green">xml</font></td>
<td>the file name of the xml file the classifier is written to, default = ./classifier.xml</td>
<td>String</td>
</tr>
<tr>
<td><font color="green">adjust</font></td>
<td>a switch whether to adjust the motif length, i.e., either to shrink or expand, default = true</td>
<td>Boolean</td>
</tr>
<tr>
<td><font color="green">p-val</font></td>
<td>a p-value for predicting binding sites, valid range = [0.0, 1.0], OPTIONAL</td>
<td>Double</td>
</tr>
</table>

Latest revision as of 11:48, 3 May 2019

This site contains projects that use Jstacs.

  • MotifAdjuster: a tool for computational reassessment of transcription factor binding site annotations
  • Prior: apples and oranges: avoiding different priors in Bayesian DNA sequence analysis
  • GenDisMix: unifying generative and discriminative learning principles
  • Dispom: de-novo discovery of differentially abundant transcription factor binding sites including their positional preference
  • MiMB: probabilistic approaches to transcription factor binding site prediction
  • SHMM: utilizing gene-pair orientations for improved analysis of ChIP-chip promoter array data
  • DSHMM: exploiting prior knowledge and gene distances in the analysis of tumor expression profiles
  • PHHMM: improved analysis of Array-CGH data
  • MeDIP-HMM: HMM-based analysis of DNA methylation profiles
  • ARHMM: integrating local chromosomal dependencies into the analysis of tumor expression profiles
  • FlowCap: molecular classification of acute myeloid leukaemia (AML) using flow cytometry data
  • TALgetter: prediction of TAL effector target sites
  • TALENoffer: genome-wide TALEN off-target prediction
  • Dimont: general approach for discriminative de-novo motif discovery from high-throughput data
  • AUC-PR: area under ROC and PR curves for weighted and unweighted data
  • Slim: Sparse local inhomogeneous mixture (Slim) models and dependency logos
  • PMMdeNovo: de novo motif discovery based on inhomogeneous parsimonious Markov models (PMMs) for exploiting intra-motif dependencies
  • AnnoTALE: identifying and analysing TALEs in Xanthomonas genomes, for clustering TALEs, for assigning novel TALEs to existing classes, for proposing TALE names using a unified nomenclature, and for predicting TALE targets
  • GeMoMa: Gene Model Mapper (GeMoMa) is a homology-based gene prediction program that uses the annotation of protein-coding genes in a reference genome to infer annotation of protein-coding genes in a target genome
  • InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites
  • Disentangler: two tools for analyzing complex features in a set of aligned transcription factor (TFBS) binding sites that can be used individually or within a joint pipeline.
  • PCTLearn: efficient learning of parsimonious context trees from sequence data.
  • Catchitt: collection of tools for predicting cell type-specific binding regions of transcription factors
  • PrediTALE: predict TALE target boxes using a novel model learned from quantitative data based on the RVD sequence of a TALE