This webpage makes available a prototype implementation of the CCC-Biclustering algorithm coded in Java together with the datasets and examples used in the paper:
Sara C. Madeira, Miguel C. Teixeira, Isabel Sá Correia and Arlindo L. Oliveira, "Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithms", IEEE/ACM Transactions on Computational Biology and Bioinformtaics (to appear). [DOI Article Link]
Synthetic
- Randomly generated 1500x50 matrix [.txt]
- Randomly generated 1500x50 matrix with 10 planted CCC-Biclusters [.txt] [.txt]
Real
Synthetic
- Randomly generated 1500x50 matrix
- Sorted by statistical significance p-value [.txt]
- Randomly generated 1500x50 matrix with 10 planted CCC-Biclusters
- Sorted by statistical significance p-value [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt]
Real
- Cell Cycle
- Heat Stress
- Sorted by statistical significance p-value [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt] details
The software available here allows the reproduction of the results in the paper and also the execution of the CCC-Biclustering algorithm using a gene expression matrix provided by the user. The gene expression matrix must be a .txt file formatted as in the examples provided below.
The algorithm is coded in Java. Before running the examples below please make sure the version of jdk installed in your computer is at least jdk1.5. The algorithm should run in any operating system. A gigabyte of memory is recommended if you want to run the algorihm in large gene expression matrices.
In order to run the algorithm copy the .jar file together with the .txt file containing the expression matrix to the same directory and type the commands below in the command line.
If you have any questions please contact Sara C. Madeira.
Reproduce Results in the Paper
java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_CCC_Biclustering_Synthetic.jar
- Synthetic Data [.jar] [matrixNoPlantedBiclusters ] [matrixPlantedBiclusters]
java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_Cell_Cycle.jar
- Cell Cycle Data [.jar] [cell_cycle.txt]
java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_Heat_Stress.jar
- Heat Stress Data [.jar][heat_stress.txt]
Run CCC-Biclustering with Other Datasets [.jar]
java -jar -Xss50M -Xms1024M -Xmx1024M Test_TCBB_CCC_Biclustering.jar yourExpressionMatrix.txt overlapping
yourExpressionMatrix.txt - name of the .txt file containing your expression matrix
overlapping - float in [0,1] containing the maximum percentage of overlapping allowed (all CCC-Biclusters overlapping more than this value are filtered)
Last Update: July 2009