This webpage makes available a prototype implementation of the e-CCC-Biclustering algorithm coded in Java together with the dataset and examples used in the paper:
Sara C. Madeira and Arlindo L. Oliveira, "A polynomial time biclustering algorithm for finding genes with approximate expression patterns in gene expression time series", Algorithms for Molecular Biology 2009, 4:8 (4 June 2009). [DOI Article Link]
Synthetic
Real
- Heat Stress [.txt]
Synthetic
- Illustrative example
- Maximal CCC-Biclusters with at least two rows [.txt]
- Maximal 1-CCC-Biclusters with at least three rows/columns [.txt]
- Maximal 1-CCC-Biclusters with at least three rows/columns and errors restricted to the 1-neighborhood of the alphabet {D,N,U} [.txt]
- Illustrative example with missing values
- Maximal 1-CCC-Biclusters with at least three rows/columns when missing values are considered as valid errors [.txt]
- Maximal 1-CCC-Biclusters with at least three rows and two columns when missing values are "jumped over" [.txt]
- Maximal 1-CCC-Biclusters with sign-changes with at least three rows/columns when missing values are "jumped over" [.txt]
Real
- e-CCC-Biclustering
- Sorted by statistical significance p-value [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt]
- Biological validation details
- CCC-Biclustering
- Sorted by statistical significance p-value [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]
- Sorted by statistical significance p-value, filtered statistical p-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt]
- Biological validation details
The software available here allows the reproduction of the results in the paper and also the execution of the e-CCC-Biclustering algorithm using a gene expression matrix provided by the user. The gene expression matrix must be a .txt file formatted as in the examples provided below.
The algorithm is coded in Java. Before running the examples below please make sure the version of jdk installed in your computer is at least jdk1.5. The algorithm should run in any operating system. A gigabyte of memory is recommended if you want to run the algorihm in large gene expression matrices.
In order to run the algorithm copy the .jar file together with the .txt file containing the expression matrix to the same directory and type the commands below in the command line.
If you have any questions please contact Sara C. Madeira.
Reproduce Results in the Paper
- Heat Stress Data
- CCC-Biclustering: follow the instructions on the CCC-Biclustering webpage.
java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_Heat_Stress.jar
- e-CCC-Biclustering: [.jar][heat_stress.txt]
Run e-CCC-Biclustering with Other Datasets and Options
- Running standard e-CCC-Biclustering (no extensions)
Copy the following [.jar]file into the directory where you want to run the algorithm together with the .txt file containing your expression matrix. yourExpressionMatrix.txt - name of the .txt file containing your expression matrix
Then type the command below in your command line and replace the 5 parameters with the values of your choice.
java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering.jar yourExpressionMatrix.txt maxErrors rowQuorum columnQuorum overlapping
maxErrors - integer containing the amount of errors allowed, per gene, in the e-CCC-Biclustering algorithm (value of e)
rowQuorum - integer containing the row quorum (minimum number of genes allowed in e-CCC-Biclusters)
columnQuorum - integer containing the column quorum (minimum number of contiguous time points allowed in e-CCC-Biclusters)
overlapping - float in [0,1] containing the maximum percentage of overlapping allowed (all e-CCC-Biclusters overlapping more than this value are filtered)
For example, if you want to use a matrix in the file matrix.txt, use e=1, row quorum = 3, column quorum = 2 and filter e-CCC-Biclusters overllapping more than 25% you should type and execute the following command:
java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering.jar data.txt 1 3 2 0.25
- Running extended e-CCC-Biclustering
Copy the following [.jar]file into the directory where you want to run the algorithm together with the .txt file containing your expression matrix.
Then type the command below in your command line and replace the 8 parameters with the values of your choice.
java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering_Extensions.jar yourExpressionMatrix.txt maxErrors rowQuorum columnQuorum overlapping missings anticorrelation restrictedErrors
yourExpressionMatrix.txt - name of the .txt file containing your expression matrix
maxErrors - integer containing the amount of errors allowed, per gene, in the e-CCC-Biclustering algorithm (value of e)
rowQuorum - integer containing the row quorum (minimum number of genes allowed in e-CCC-Biclusters)
columnQuorum - integer containing the column quorum (minimum number of contiguous time points allowed in e-CCC-Biclusters)
overlapping - float in [0,1] containing the maximum percentage of overlapping allowed (all e-CCC-Biclusters overlapping more than this value are filtered)
anticorrelation - char with two possible values:
N - no anticorrelation allowed
Y - anticorrelation alllowed, the algorithm will look for e-CCC-Biclusters with Sign-Changes
For example, if you want to use a matrix in the file matrix.txt, use e=1, row quorum = 3, column quorum = 2, filter e-CCC-Biclusters overllapping more than 25%, "jump over" missing values, allow anticorrelation and restricted the errors to the 1-neighborhood of the symbols in the alphabet you should type and execute the following command:
java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering.jar data.txt 1 3 2 0.25 J Y Y
Last Update: July 2009