BiGGeSTS - Biclustering Gene Expression Time Series

Overview
News
License
Download
Known Issues
Documentation
People
Contacts

Joana P. Gonçalves, Sara C. Madeira and Arlindo L. Oliveira

BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data

BMC Research Notes 2009, 2:124 [ abstract] [ pdf] [ html] [ bibTeX] ( doi:10.1186/1756-0500-2-124)

Overview

BiGGEsTS is a free and open source software tool providing an integrated environment for the biclustering (Madeira and Oliveira, 2004) analysis of time series gene expression data. It offers a complete set of operations for retrieving potentially relevant information from the gene expression data, relying either on visualization or additional techniques for manipulating and processing this particular kind of data. Visualization includes colored matrices, expression evolution charts and pattern charts, as well as dendograms derived from the results obtained by applying hierarchical clustering algorithms on the gene expression data and ontology graphs highlighting the relevant biological terms annotated with the dataset genes in the Gene Ontology for specific organisms.

BiGGEsTS integrates well known techniques for preprocessing data: filtering genes, filling missing values, smoothing, normalization and discretization. This software makes available to the scientific community state of the art biclustering algorithms (Madeira et al., 2010) (Madeira and Oliveira, 2009) (Zhang et al., 2005) specifically developed for time series expression data and suited to extract subsets of genes that exhibit coherent expression evolutions in specific subsets of experimental conditions, that is, biclusters. Biclusters may be analyzed with Gene Ontology annotations to find out which contain statistically relevant biological information or even filtered or sorted according to several numerical and statistical criteria.

Features

Below is a list of the main features in BiGGEsTS:

Input of (time series gene expression) data
Preprocessing:
     - filter genes with missing values
     - fill missing values
     - normalize
     - smooth
     - discretize
Biclustering:
     - CCC-Biclustering: biclusters with exact expression patterns in discrete data
     - e-CCC-Biclustering: biclusters with approximate expression patterns in discrete data
     - CC-TSB-Biclustering: biclusters with approximate expression patterns in real data
Analysis:
     - matrices of values, colors and symbols, and dendrograms
     - charts of expression profiles and patterns
     - annotated GO terms and term-for-term analysis
Post-processing:
- filter biclusters
- sort biclusters

Citing BiGGEsTS

Joana P. Gonçalves, Sara C. Madeira and Arlindo L. Oliveira, BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data, BMC Research Notes 2009, 2:124. [ abstract] [ full text pdf] [full text html] [ bibTeX]

Joana P. Gonçalves, Sara C. Madeira and Arlindo L. Oliveira, BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data, INESC-ID Technical Report 23/2009, March 2009. [pdf] [bibTeX]

News and Updates

2011/11/18 - A new version of BiGGEsTS was released (v1.0.5). [change log]

2011/04/05 - Updated v1.0.4 with fixes: corrected the behavior of right-click on a tree node for Mac OS, changed the behavior of install.sh to look for the default Graphviz location and give the opportunity to browse the file system if anything else fails.

2011/03/04 - A new version of BiGGEsTS was released (v1.0.4). [change log]

2009/08/27 - A new version of BiGGEsTS was released (v1.0.3). [change log]

2009/07/07 - The article describing the BiGGEsTS software is now available:

                    Joana P. Gonçalves, Sara C. Madeira and Arlindo L. Oliveira.
                    BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data.
                    BMC Research Notes 2009, 2:124 [ abstract] [ full text pdf] [full text html] [ bibTeX]

2009/05/12 - The article entitled "BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data", describing the BiGGEsTS software, was accepted for publication in BMC Research Notes.
2009/05/06 - A new version of BiGGEsTS was released (v1.0.2). [change log]
2009/03/24 - A new version of BiGGEsTS was released (v1.0.1). [change log]
2008/07/10 - BiGGEsTS poster @ FPBC 2008 (First Portuguese Forum in Computational Biology), Oeiras, Portugal, July 10-12, 2008 [abstract] [pdf] [png]
2008/05/25 - BiGGEsTS is finally online (v1.0.0).

GNU GPL v3

BiGGEsTS is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version. BiGGEsTS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A copy of the GNU General Public License is included with BiGGEsTS. For detailed information about the license, see GNU licenses.

Binaries | Source code | Sample files | Installation instructions

Download BiGGEsTS binaries.
(see the installation instructions and the BiGGEsTS Quickstart (v1.0.5) document for details on installation and usage instructions)

Windows installer
    BiGGEsTS version 1.0.5 [exe] [change log]
    BiGGEsTS version 1.0.4 [exe] [change log]
    BiGGEsTS version 1.0.3 [exe] [change log]
    BiGGEsTS version 1.0.2 [exe] [change log]
    BiGGEsTS version 1.0.1 [exe] [change log]
    BiGGEsTS version 1.0.0 [exe]

Multi-platform distribution
    BiGGEsTS version 1.0.5 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.4 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.3 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.2 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.1 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.0 [zip] [tar.gz]

Download BiGGEsTS source code.

Source code
    BiGGEsTS version 1.0.5 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.4 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.3 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.2 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.1 [zip] [tar.gz] [change log]
    BiGGEsTS version 1.0.0 [zip] [tar.gz]

Third-party libraries
    Libraries for v1.0.5 [zip] [tar.gz] [change log]
    Libraries for v1.0.4 [zip] [tar.gz] [change log]
    Libraries for v1.0.3 [zip] [tar.gz] [change log]
    Libraries for v1.0.2 [zip] [tar.gz] [change log]
    Libraries for v1.0.1 [zip] [tar.gz] [change log]
    Libraries for v1.0.0 [zip] [tar.gz]

Download BiGGEsTS sample files.
BiGGEsTS sample files [zip] [tar.gz]

Installation instructions (v1.0.4 or higher)

On Windows, using the installer:

Run the installer (double-click) and follow the instructions.
Icons and shortcuts are added to both the Desktop and Start menu. You can use them to run BiGGEsTS.

On Windows, using the multi-platform distribution:

After downloading the zip or tar.gz file from BiGGEsTS website, decompress it to a suitable location.
Execute the installer file inside the resulting directory (double-click the install.bat file). Wait for the installation to conclude.
You may now execute BiGGEsTS anytime by executing the biggests script file (double-click the biggests.bat file).

On Linux or MacOS, using the multi-platform distribution:

Install Graphviz on your system. You’ll find the source code and rpms, as well as documentation, at http://www.graphviz.org/.
Download the zip or tar.gz file from BiGGEsTS website and decompress its contents to a suitable location.
Edit BiGGEsTS installer file (install.sh), appending the path to the dot binary file (usually /usr/bin/dot), preceeded by a space, to the last line (e.g. “java -classpath biggests.jar biggests/utils/BiggestsInstall /usr/bin/dot”).
Execute install.sh.
You may now execute BiGGEsTS whenever you want by running biggests.sh.

Important notes:
1. BiGGEsTS attempts to automatically determine the characteristics of your system and parameterize the JVM accordingly. However, as BiGGEsTS is able to run in a wide range of platforms, it is not always possible to find the best configuration of JVM parameters. The most common problem concerns the Java heap size. If you get a command line error message similar to this

java -Xss5M -Xms1073204k -Xmx1073204k -jar biggests.jar
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.

The most direct solution is to edit the biggest.bat (Windows) or biggest.sh (UNIX) file and replace the Xms and Xmx parameters by values lower than the automatically determined ones. You should also guarantee that such values do not exceed the amount of physical memory available in your system.

2. If using BiGGEsTS with Windows 7, please be aware that most operations involving the manipulation of Gene Ontology files (access to GO annotations, term-for-term analysis, graph of significant terms) will be compromised if you try to install the software under "C:\Program Files". Therefore, please install BiGGEsTS on a different location.

Known issues.

Please note that the biclustering problem is hard and, thus, the available algorithms may take some time and require a considerable amount of memory to compute, depending on the system and the dimension of the data. If system resources do not meet the needs, the computation may end without success.

BiGGEsTS currently uses version 1 of Ontologizer, which does not support the parsing of recent OBO files v1.0. We are currently working in the integration of Ontologizer v2. [solved on version 1.0.3]

Documentation

Tutorial
    BiGGEsTS Quickstart version 1.0.5 [pdf (view)] [pdf (print)]
    BiGGEsTS Quickstart version 1.0.4 [pdf (view)] [pdf (print)]
    BiGGEsTS Quickstart version 1.0.3 [pdf (view)] [pdf (print)]
    BiGGEsTS Quickstart version 1.0.2 [pdf (view)] [pdf (print)]
    BiGGEsTS Quickstart version 1.0.1 [pdf (view)] [pdf (print)]
    BiGGEsTS Quickstart version 1.0.0 [pdf (view)] [pdf (print)]

Javadoc
    BiGGEsTS Javadoc version 1.0.5 [html] [zip] [tar.gz]
    BiGGEsTS Javadoc version 1.0.4 [html] [zip] [tar.gz]
    BiGGEsTS Javadoc version 1.0.3 [html] [zip] [tar.gz]
    BiGGEsTS Javadoc version 1.0.2 [html] [zip] [tar.gz]
    BiGGEsTS Javadoc version 1.0.1 [html] [zip] [tar.gz]
    BiGGEsTS Javadoc version 1.0.0 [html] [zip] [tar.gz]

Article
Joana P. Gonçalves, Sara C. Madeira and Arlindo L. Oliveira, BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data, BMC Research Notes 2009, 2:124. [ abstract] [ full text pdf] [full text html] [ bibTeX]

Technical Report
Joana P. Gonçalves, Sara C. Madeira and Arlindo L. Oliveira, BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data, INESC-ID Technical Report 23/2009, March 2009. [pdf] [bibTeX]

Joana P. Gonçalves received the Diploma in information systems and computer engineering from the University of Beira Interior (UBI) in 2007. She is currently a PhD student at Instituto Superior Técnico (IST), Technical University of Lisbon, and a researcher at the Knowledge Discovery and Bioinformatics (KDBIO) group of INESC-ID. Her research interests include development of computational methods for the identification of regulatory modules in gene regulatory networks using heterogeneous data, computational biology and knowledge discovery.

Sara C. Madeira received the Diploma in computer science from the University of Beira Interior (UBI) in 2000, and the MSc degree in information systems and computer engineening from Instituto Superior Técnico (IST), Technical University of Lisbon in 2002. In December 2008 she received the PhD degree in information systems and computer engineering also from IST. Her PhD thesis in the area of computational biology, entitled "Efficient Biclustering Algorithms for Time Series Gene Expression Data Analysis", proposes efficient biclustering algorithms, based on string processing techniques and combinatorial optimization, for the analysis of gene expression time series. She is currently an senior researcher at the Knowledge Discovery and Bioinformatics (KDBIO) group of INESC-ID and a auxiliar professor in the Department of Computer Science and Engineering of IST (IST). Her research interests include bioinformatics/computational biology, algorithm design, machine learning, and data mining.

Arlindo L. Oliveira received the BSc and MSc degrees in electrical and computer engineering from the Technical University of Lisbon (UTL), and the PhD degree in electrical engineering and computer science from the University of California (UC Berkeley), Berkeley, in 1986, 1989 and 1994, respectively. He is currently a professor at Instituto Superior Técnico (IST), Technical University of Lisbon. He is also a senior researcher at the Knowledge Discovery and Bioinformatics (KDBIO) group of INESC-ID. His research interests include bioinformatics, systems biology, string processing, algorithm design, combinatorial optimization, machine learning, logic synthesis and automata theory.

BiGGEsTS email:

jpg [at] kdbio [dot] inesc-id [dot] pt

(replace [at] by @ and [dot] by . when sending to the above address)