|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectbiggests.clustering.AbstractMetricClustering
biggests.clustering.HierarchicalClustering
public class HierarchicalClustering
Title: Hierarchical Clustering
Description: This class implements Hierarchical Clustering over
expression data. Most of the methods are based in
the C source code of Cluster 3.0 (version 1.36) by
Michiel de Hoon, while at the Laboratory of DNA
Information Analysis, Human Genome Center, Institute
of Medical Science, University of Tokyo, Japan.
Cluster 3.0 is an enhanced version of Cluster,
which was originally developed by Michael Eisen
while at Stanford University.
Copyright: Copyright (C) 2007 Joana Gonçalves This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. GNU General Public License also complies with the terms of Cluster/TreeView original license
| Field Summary | |
|---|---|
static char |
AVERAGE_LINKAGE
Pairwise average linkage (clustering method). |
static char |
CENTROID_LINKAGE
Pairwise centroid-linkage (clustering method). |
static char |
COMPLETE_LINKAGE
Pairwise complete-linkage (clustering method). |
static char |
SINGLE_LINKAGE
Pairwise single-linkage (clustering method). |
| Constructor Summary | |
|---|---|
HierarchicalClustering(IMatrix expressionMatrix)
Creates a new HierarchicalClustering object to
perform hierarchical clustering over the set of gene expression
data contained in expressionMatrix. |
|
HierarchicalClustering(int numberOfGenes,
int numberOfConditions,
double[] genesWeights,
double[] conditionsWeights,
double[] genesOrder,
double[] conditionsOrder,
int[] genesIndexes,
int[] conditionsIndexes,
java.lang.String uniqueID,
java.lang.String[] genesUniqueIDs,
java.lang.String[] genesNames,
java.lang.String[] conditionsNames,
double[][] expressionData,
int[][] missingsMask)
Creates a new HierarchicalClustering object to
perform hierarchical clustering over the given set of
expression data. |
|
| Method Summary | |
|---|---|
static ClusterTreeNode[] |
averageLinkageClustering(double[][] distanceMatrix)
Performs clustering using pairwise average-linking on the given distance matrix. |
static double[] |
calculateWeights(double[][] expressionData,
int[][] missingsMask,
double[] originalWeights,
boolean dimension,
char distanceMetric,
double cutOff,
double exponent)
This function calculates the weights using the weighting scheme proposed by Michael Eisen: w[i] = 1.0 / sum_{j where d[i][j] |
static ClusterTreeNode[] |
centroidLinkageClustering(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
double[][] distanceMatrix,
char distanceMetric,
boolean clusteringDimension)
Performs clustering using pairwise centroid-linking on a given set of gene expression data (specified by expressionData
and missingsMask), using the distance matrix given by
distanceMatrix. |
static ClusterTreeNode[] |
completeLinkageClustering(double[][] distanceMatrix)
Performs clustering using pairwise complete-linkage on the given distance matrix. |
static double |
computeDistance(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension,
char distanceMetric)
Calculates the distance between two genes/rows or two conditions/columns. |
static double[][] |
computeDistanceMatrix(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
char distanceMetric,
boolean clusteringDimension)
Calculates the distance matrix between genes/rows or conditions/columns using their measured gene expression data. |
static double |
findClosestPair(int maxRowLimit,
double[][] distanceMatrix,
int[] ipointer,
int[] jpointer)
Searches the distance matrix to find the pair with the shortest distance between them. |
void |
hierarchicalClusteringWithFileOutput(char genesMetric,
char conditionsMetric,
char clusteringMethod,
java.lang.String jobName)
Performs hierarchical clustering over this set of data and writes output information in corresponding files (.GTR, .ATR, .CDT). |
static ClusterTreeNode[] |
hierarchicalClusteringWithTreeOutput(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
boolean clusteringDimension,
char distanceMetric,
char clusteringMethod,
double[][] distanceMatrix)
Performs hierarchical clustering using pairwise single-, maximum-, centroid-, or average-linkage, as defined by clusteringMethod, on a given set of gene expression
data, using the distance metric given by distanceMetric. |
static ClusterTreeNode[] |
singleLinkageClustering(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
double[][] distanceMatrix,
char distanceMetric,
boolean clusteringDimension)
Performs single-linkage hierarchical clustering, using either the distance matrix directly, if available, or by calculating the distances from the expression data matrix. |
void |
sortClusteringTree(boolean sortingDimension,
double[] clusteringDimensionOrder,
double[] nodesOrder,
int[] nodesCounts,
ClusterTreeNode[] tree)
Sorts the nodes of the tree based on the
tree's structure. |
| Methods inherited from class biggests.clustering.AbstractMetricClustering |
|---|
cityblockDistance, computeRanks, computeRanks, euclideanDistance, kendallsTauCorrelation, pearsonsAbsoluteCorrelation, pearsonsCorrelation, sortIndexes, spearmansRankCorrelation, uncenteredAbsoluteCorrelation, uncenteredCorrelation |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final char COMPLETE_LINKAGE
public static final char SINGLE_LINKAGE
public static final char CENTROID_LINKAGE
public static final char AVERAGE_LINKAGE
| Constructor Detail |
|---|
public HierarchicalClustering(IMatrix expressionMatrix)
HierarchicalClustering object to
perform hierarchical clustering over the set of gene expression
data contained in expressionMatrix.
expressionMatrix - IMatrix
public HierarchicalClustering(int numberOfGenes,
int numberOfConditions,
double[] genesWeights,
double[] conditionsWeights,
double[] genesOrder,
double[] conditionsOrder,
int[] genesIndexes,
int[] conditionsIndexes,
java.lang.String uniqueID,
java.lang.String[] genesUniqueIDs,
java.lang.String[] genesNames,
java.lang.String[] conditionsNames,
double[][] expressionData,
int[][] missingsMask)
throws java.lang.Exception
HierarchicalClustering object to
perform hierarchical clustering over the given set of
expression data.
numberOfGenes - int the number of genes in the
experimentnumberOfConditions - int the number of conditions
in the experimentgenesWeights - double[] the set of genes' weightsconditionsWeights - double[] the set of conditions'
weightsgenesOrder - double[] the order of the genesconditionsOrder - double[] the order of the conditionsgenesIndexes - int[] the set of genes' indexesconditionsIndexes - int[] the set of conditions'
indexesuniqueID - String the unique ID type namegenesUniqueIDs - String[] the set of genes' unique IDsgenesNames - String[] the set of genes' namesconditionsNames - String[] the set of conditions' namesexpressionData - double[][] the set of expression
values measured for the genes in the experimental conditionsmissingsMask - int[][] the mask which states if each
element contained in expressionData is valid (1) or
missing (0)
java.lang.Exception - - if array and/or matrices dimensions do not match
the numberOfGenes and/or
numberOfConditions| Method Detail |
|---|
public void hierarchicalClusteringWithFileOutput(char genesMetric,
char conditionsMetric,
char clusteringMethod,
java.lang.String jobName)
throws java.io.IOException
genesMetric - char the distance metric to use in
genes' clustering; available distance metrics are:NO_CLUSTERING -
with this option no clustering will be performed on the
genes' dimension
UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
conditionsMetric - char the distance metric to use
in conditions' clustering; available distance metrics are:NO_CLUSTERING -
with this option no clustering will be performed on the
columns' dimension
UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
clusteringMethod - char the clustering method to apply;
available clustering methods are:
COMPLETE_LINKAGE
SINGLE_LINKAGE
AVERAGE_LINKAGE
CENTROID_LINKAGE
jobName - String the name to appear in all
output filenames, before the file extension
java.io.IOException - - if files could not be opened for writing or
some problem occured while writing to them
public void sortClusteringTree(boolean sortingDimension,
double[] clusteringDimensionOrder,
double[] nodesOrder,
int[] nodesCounts,
ClusterTreeNode[] tree)
throws java.lang.Exception
tree based on the
tree's structure.
sortingDimension - boolean states in which
dimension clustering was performed, which is also the
same dimension to perform the sorting operation; the
available dimensions are:
GENES_DIMENSION
CONDITIONS_DIMENSION
clusteringDimensionOrder - double[] the order
of the genes or conditions, according to the dimension in
which the tree will be sortednodesOrder - double[] the order of the nodesnodesCounts - int[] the number of elements for
each nodetree - Node[] the set of nodes that make up the tree
java.lang.Exception
public static ClusterTreeNode[] hierarchicalClusteringWithTreeOutput(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
boolean clusteringDimension,
char distanceMetric,
char clusteringMethod,
double[][] distanceMatrix)
throws java.lang.ArrayIndexOutOfBoundsException,
java.lang.Exception
clusteringMethod, on a given set of gene expression
data, using the distance metric given by distanceMetric.
If successful, the function returns a pointer to a newly allocated
array of Nodes containing the hierarchical clustering
solution.
expressionData - double[][] the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask - int[][] shows which expression
values are missing; if missingsMask[i][j] == 0,
then expressionData[i][j] is missing; otherwise,
if missingsMask[i][j] == 1, then
expressionData[i][j] is validelementsWeights - double[] the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedclusteringDimension - boolean the clustering
dimension; available dimensions:
GENES_DIMENSION - the genes/rows of
expressionData are clustered
CONDITIONS_DIMENSION - the conditions/columns
of expressionData are clustered
distanceMetric - char the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringMethod - char the clustering
algorithm/method to use:
COMPLETE_LINKAGE
SINGLE_LINKAGE
AVERAGE_LINKAGE
CENTROID_LINKAGE
distanceMatrix - double[][] the distance matrix;
if the distance matrix is null initially, it will be
calculated from the data; if the given distance matrix is
not null, its contents will be modified as part of the
clustering algorithm
Node objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public static double[][] computeDistanceMatrix(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
char distanceMetric,
boolean clusteringDimension)
throws java.lang.Exception
expressionData - double[][] the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask - int[][] shows which expression
values are missing; if missingsMask[i][j] == 0,
then expressionData[i][j] is missing; otherwise,
if missingsMask[i][j] == 1, then
expressionData[i][j] is validelementsWeights - double[] the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculateddistanceMetric - char the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension - boolean the clustering
dimension; available dimensions:
GENES_DIMENSION - the genes/rows of
expressionData are clustered
CONDITIONS_DIMENSION - the conditions/columns
of expressionData are clustered
java.lang.Exception
public static ClusterTreeNode[] singleLinkageClustering(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
double[][] distanceMatrix,
char distanceMetric,
boolean clusteringDimension)
throws java.lang.Exception
expressionData - double[][] the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask - int[][] shows which expression
values are missing; if missingsMask[i][j] == 0,
then expressionData[i][j] is missing; otherwise,
if missingsMask[i][j] == 1, then
expressionData[i][j] is validelementsWeights - double[] the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculateddistanceMatrix - double[][] the distance matrix; if the
distance matrix is not null, it is used to speed up the
clustering calculation and the original expression data
(specified by expressionData and
missingsMask) are not needed and, therefore,
ignored; the contents of the distance matrix
are not modified; if distanceMatrix is null,
the pairwise distances are calculated from the gene expression
data (the expressionData and
missingsMask bidimensional arrays) and stored in
temporary arraysdistanceMetric - char the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension - boolean the clustering
dimension; available dimensions:
GENES_DIMENSION - the genes/rows of
expressionData are clustered
CONDITIONS_DIMENSION - the conditions/columns
of expressionData are clustered
Node objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.Exception
public static ClusterTreeNode[] completeLinkageClustering(double[][] distanceMatrix)
throws java.lang.ArrayIndexOutOfBoundsException,
java.lang.Exception
distanceMatrix - double[][] the distance matrix; the number of
rows (first dimension of this bidimensional array) is equal
to the number of elements to be clustered; this is a ragged
array containing the distances between the elements to be
clustered (genes/rows or conditions/columns); as the distance
matrix is symmetric, with zeros on the diagonal, only the
lower triangular half of the distance matrix is saved and
used; the distance matrix is modified by this method
Node objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public static ClusterTreeNode[] averageLinkageClustering(double[][] distanceMatrix)
throws java.lang.ArrayIndexOutOfBoundsException,
java.lang.Exception
distanceMatrix - double[][] the distance matrix; the number of
rows (first dimension of this bidimensional array) is equal
to the number of elements to be clustered; this is a ragged
array containing the distances between the elements to be
clustered (genes/rows or conditions/columns); as the distance
matrix is symmetric, with zeros on the diagonal, only the
lower triangular half of the distance matrix is saved and
used; the distance matrix is modified by this method
Node objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public static ClusterTreeNode[] centroidLinkageClustering(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
double[][] distanceMatrix,
char distanceMetric,
boolean clusteringDimension)
throws java.lang.ArrayIndexOutOfBoundsException,
java.lang.Exception
expressionData
and missingsMask), using the distance matrix given by
distanceMatrix.
expressionData - double[][] the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask - int[][] shows which expression
values are missing; if missingsMask[i][j] == 0,
then expressionData[i][j] is missing; otherwise,
if missingsMask[i][j] == 1, then
expressionData[i][j] is validelementsWeights - double[] the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculateddistanceMatrix - double[][] the distance matrix; if the
distance matrix is not null, it is used to speed up the
clustering calculation and the original expression data
(specified by expressionData and
missingsMask) are not needed and, therefore,
ignored; the contents of the distance matrix
are not modified; if distanceMatrix is null,
the pairwise distances are calculated from the gene expression
data (the expressionData and
missingsMask bidimensional arrays) and stored in
temporary arraysdistanceMetric - char the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension - boolean the clustering
dimension; available dimensions:
GENES_DIMENSION - the genes/rows of
expressionData are clustered
CONDITIONS_DIMENSION - the conditions/columns
of expressionData are clustered
Node objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public static double computeDistance(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension,
char distanceMetric)
throws java.lang.Exception
expressionData1 and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2.expressionData1, expressionData2,
missingsMask1 and missingsMask2 must
have the same dimensions.
expressionData1 - double[][] the expression
data matrix containing the first gene/row or condition/columnexpressionData2 - double[][] the expression
data matrix containing the second gene/row or condition/columnmissingsMask1 - int[][] the mask that indicates
which expression values are missing in expressionData1missingsMask2 - int[][] the mask that indicates
which expression values are missing in expressionData2elementsWeights - double[] the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1 - int the index of the first gene/row or
condition/column in expressionData1 and
missingsMask1index2 - int the index of the first gene/row or
condition/column in expressionData2 and
missingsMask2dimension - boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION - the distance is measured
between two genes/rows
CONDITIONS_DIMENSION - the distance is
measured between two conditions/columns
distanceMetric - char the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
java.lang.Exception
public static double findClosestPair(int maxRowLimit,
double[][] distanceMatrix,
int[] ipointer,
int[] jpointer)
throws java.lang.Exception
ipointer and jpointer
(should be acessible at ipointer[0] and jpointer[0] to be precise),
respectively, and the distance itself is the return value.
maxRowLimit - int the maximum number of elements
of the distance matrix where this method should look for the
pair with the shortest distance; this value should not exceed
the number of rows in the distanceMatrixdistanceMatrix - double[][] the distance matrix; this is a ragged
array containing the distances between elements (genes/rows or
conditions/columns); as the distance matrix is symmetric,
with zeros on the diagonal, only the lower triangular half
of the distance matrix is saved and usedipointer - int[] an array with, at least, one
element to receive the first index of the pair with the
shortest distancejpointer - int[] an array with, at least, onde element
to receive the second index of the pair with the shortest distance
java.lang.Exception
public static double[] calculateWeights(double[][] expressionData,
int[][] missingsMask,
double[] originalWeights,
boolean dimension,
char distanceMetric,
double cutOff,
double exponent)
throws java.lang.Exception
expressionData - double[][] the bidimensional
array that contains the expression datamissingsMask - int[][] the mask that indicates
which expression values are missing in expressionDataoriginalWeights - double[] the original weights
used to calculate the distancesdimension - boolean the dimension of the computation
(either GENES_DIMENSION or
CONDITIONS_DIMENSION)distanceMetric - char the distance metric used
to calculate distancescutOff - double the cutoff to use in weight calculationexponent - double the exponent to use in weight calculation
java.lang.Exception
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||