This webpage makes available a prototype implementation of the CCC-Biclustering algorithm coded in Java together with the datasets and examples used in the paper:

Sara C. Madeira, Miguel C. Teixeira, Isabel Sá Correia and Arlindo L. Oliveira, "Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithms", IEEE/ACM Transactions on Computational Biology and Bioinformtaics (to appear). [DOI Article Link]

## Synthetic

- Randomly generated 1500x50 matrix
[.txt]- Randomly generated 1500x50 matrix with 10 planted CCC-Biclusters
[.txt][.txt]## Real

## Synthetic

- Randomly generated 1500x50 matrix

- Sorted by statistical significance
p-value[.txt]- Randomly generated 1500x50 matrix with 10 planted CCC-Biclusters

- Sorted by statistical significance
p-value[.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction)[.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25%[.txt]## Real

- Cell Cycle
- Heat Stress

- Sorted by statistical significance
p-value [.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt] details

The software available here allows the reproduction of the results in the paper and also the execution of the CCC-Biclustering algorithm using a gene expression matrix provided by the user. The gene expression matrix must be a .txt file formatted as in the examples provided below.

The algorithm is
coded in ** Java**.
Before running the examples below please make sure the version of

In order to run
the algorithm copy the ** .jar**
file together with the

If you have any questions please contact Sara C. Madeira.

## Reproduce Results in the Paper

java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_CCC_Biclustering_Synthetic.jar

- Synthetic Data
[.jar][matrixNoPlantedBiclusters ][matrixPlantedBiclusters]

java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_Cell_Cycle.jar

- Cell Cycle Data
[.jar][cell_cycle.txt]java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_Heat_Stress.jar

- Heat Stress Data
[.jar][heat_stress.txt]## Run CCC-Biclustering with Other Datasets

java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_CCC_Biclustering.jar yourExpressionMatrix.txt overlapping[.jar]

yourExpressionMatrix.txt - name of the .txt file containing your expression matrix

overlapping - float in [0,1] containing the maximum percentage of overlapping allowed (all CCC-Biclusters overlapping more than this value are filtered)

The CCC-Biclustering
algorithm (together with extended versions allowing missing values and
the discovery of anticorrelated and scaled expression patterns) is
integrated in the software BiGGEsTS
(Biclustering Gene Expression Time Series), a free and open source
software tool providing an integrated environment for the biclustering
analysis of time series gene expression data. This software enables a
user-friendly usage of the algorithm in a graphical
environment
together with the possibility to preprocess the data and
postprocess
and analyse the results using several criteria.

Last Update: July 2009