|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectTimeConsumingTasks
biggests.clustering.AbstractMetricClustering
biggests.clustering.HierarchicalClustering
public class HierarchicalClustering
Title: Hierarchical Clustering
Description: This class implements Hierarchical Clustering over
expression data. Most of the methods are based in
the C source code of Cluster 3.0 (version 1.36) by
Michiel de Hoon, while at the Laboratory of DNA
Information Analysis, Human Genome Center, Institute
of Medical Science, University of Tokyo, Japan.
Cluster 3.0 is an enhanced version of Cluster,
which was originally developed by Michael Eisen
while at Stanford University.
Copyright: Copyright (C) 2007 Joana Gonçalves This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. GNU General Public License also complies with the terms of Cluster/TreeView original license
Field Summary | |
---|---|
static char |
AVERAGE_LINKAGE
Pairwise average linkage (clustering method). |
static char |
CENTROID_LINKAGE
Pairwise centroid-linkage (clustering method). |
static char |
COMPLETE_LINKAGE
Pairwise complete-linkage (clustering method). |
static char |
SINGLE_LINKAGE
Pairwise single-linkage (clustering method). |
Constructor Summary | |
---|---|
HierarchicalClustering(IMatrix expressionMatrix)
Creates a new HierarchicalClustering object to
perform hierarchical clustering over the set of gene expression
data contained in expressionMatrix . |
|
HierarchicalClustering(int numberOfGenes,
int numberOfConditions,
double[] genesWeights,
double[] conditionsWeights,
double[] genesOrder,
double[] conditionsOrder,
int[] genesIndexes,
int[] conditionsIndexes,
java.lang.String uniqueID,
java.lang.String[] genesUniqueIDs,
java.lang.String[] genesNames,
java.lang.String[] conditionsNames,
double[][] expressionData,
int[][] missingsMask)
Creates a new HierarchicalClustering object to
perform hierarchical clustering over the given set of
expression data. |
Method Summary | |
---|---|
ClusterTreeNode[] |
averageLinkageClustering(double[][] distanceMatrix)
Performs clustering using pairwise average-linking on the given distance matrix. |
static double[] |
calculateWeights(double[][] expressionData,
int[][] missingsMask,
double[] originalWeights,
boolean dimension,
char distanceMetric,
double cutOff,
double exponent)
This function calculates the weights using the weighting scheme proposed by Michael Eisen: w[i] = 1.0 / sum_{j where d[i][j] |
ClusterTreeNode[] |
centroidLinkageClustering(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
double[][] distanceMatrix,
char distanceMetric,
boolean clusteringDimension)
Performs clustering using pairwise centroid-linking on a given set of gene expression data (specified by expressionData
and missingsMask ), using the distance matrix given by
distanceMatrix . |
ClusterTreeNode[] |
completeLinkageClustering(double[][] distanceMatrix)
Performs clustering using pairwise complete-linkage on the given distance matrix. |
static double |
computeDistance(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension,
char distanceMetric)
Calculates the distance between two genes/rows or two conditions/columns. |
static double[][] |
computeDistanceMatrix(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
char distanceMetric,
boolean clusteringDimension)
Calculates the distance matrix between genes/rows or conditions/columns using their measured gene expression data. |
static double |
findClosestPair(int maxRowLimit,
double[][] distanceMatrix,
int[] ipointer,
int[] jpointer)
Searches the distance matrix to find the pair with the shortest distance between them. |
void |
hierarchicalClusteringWithFileOutput(char genesMetric,
char conditionsMetric,
char clusteringMethod,
java.lang.String jobName)
Performs hierarchical clustering over this set of data and writes output information in corresponding files (.GTR, .ATR, .CDT). |
ClusterTreeNode[] |
hierarchicalClusteringWithTreeOutput(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
boolean clusteringDimension,
char distanceMetric,
char clusteringMethod,
double[][] distanceMatrix)
Performs hierarchical clustering using pairwise single-, maximum-, centroid-, or average-linkage, as defined by clusteringMethod , on a given set of gene expression
data, using the distance metric given by distanceMetric . |
ClusterTreeNode[] |
singleLinkageClustering(double[][] expressionData,
int[][] missingsMask,
double[] elementsWeights,
double[][] distanceMatrix,
char distanceMetric,
boolean clusteringDimension)
Performs single-linkage hierarchical clustering, using either the distance matrix directly, if available, or by calculating the distances from the expression data matrix. |
void |
sortClusteringTree(boolean sortingDimension,
double[] clusteringDimensionOrder,
double[] nodesOrder,
int[] nodesCounts,
ClusterTreeNode[] tree)
Sorts the nodes of the tree based on the
tree's structure. |
Methods inherited from class biggests.clustering.AbstractMetricClustering |
---|
cityblockDistance, computeRanks, computeRanks, euclideanDistance, kendallsTauCorrelation, pearsonsAbsoluteCorrelation, pearsonsCorrelation, sortIndexes, spearmansRankCorrelation, uncenteredAbsoluteCorrelation, uncenteredCorrelation |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final char COMPLETE_LINKAGE
public static final char SINGLE_LINKAGE
public static final char CENTROID_LINKAGE
public static final char AVERAGE_LINKAGE
Constructor Detail |
---|
public HierarchicalClustering(IMatrix expressionMatrix)
HierarchicalClustering
object to
perform hierarchical clustering over the set of gene expression
data contained in expressionMatrix
.
expressionMatrix
- IMatrix
public HierarchicalClustering(int numberOfGenes, int numberOfConditions, double[] genesWeights, double[] conditionsWeights, double[] genesOrder, double[] conditionsOrder, int[] genesIndexes, int[] conditionsIndexes, java.lang.String uniqueID, java.lang.String[] genesUniqueIDs, java.lang.String[] genesNames, java.lang.String[] conditionsNames, double[][] expressionData, int[][] missingsMask) throws java.lang.Exception
HierarchicalClustering
object to
perform hierarchical clustering over the given set of
expression data.
numberOfGenes
- int
the number of genes in the
experimentnumberOfConditions
- int
the number of conditions
in the experimentgenesWeights
- double[]
the set of genes' weightsconditionsWeights
- double[]
the set of conditions'
weightsgenesOrder
- double[]
the order of the genesconditionsOrder
- double[]
the order of the conditionsgenesIndexes
- int[]
the set of genes' indexesconditionsIndexes
- int[]
the set of conditions'
indexesuniqueID
- String
the unique ID type namegenesUniqueIDs
- String[]
the set of genes' unique IDsgenesNames
- String[]
the set of genes' namesconditionsNames
- String[]
the set of conditions' namesexpressionData
- double[][]
the set of expression
values measured for the genes in the experimental conditionsmissingsMask
- int[][]
the mask which states if each
element contained in expressionData
is valid (1) or
missing (0)
java.lang.Exception
- - if array and/or matrices dimensions do not match
the numberOfGenes
and/or
numberOfConditions
Method Detail |
---|
public void hierarchicalClusteringWithFileOutput(char genesMetric, char conditionsMetric, char clusteringMethod, java.lang.String jobName) throws java.io.IOException
genesMetric
- char
the distance metric to use in
genes' clustering; available distance metrics are:NO_CLUSTERING
-
with this option no clustering will be performed on the
genes' dimension
UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
conditionsMetric
- char
the distance metric to use
in conditions' clustering; available distance metrics are:NO_CLUSTERING
-
with this option no clustering will be performed on the
columns' dimension
UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
clusteringMethod
- char
the clustering method to apply;
available clustering methods are:
COMPLETE_LINKAGE
SINGLE_LINKAGE
AVERAGE_LINKAGE
CENTROID_LINKAGE
jobName
- String
the name to appear in all
output filenames, before the file extension
java.io.IOException
- - if files could not be opened for writing or
some problem occured while writing to thempublic void sortClusteringTree(boolean sortingDimension, double[] clusteringDimensionOrder, double[] nodesOrder, int[] nodesCounts, ClusterTreeNode[] tree) throws java.lang.Exception
tree
based on the
tree's structure.
sortingDimension
- boolean
states in which
dimension clustering was performed, which is also the
same dimension to perform the sorting operation; the
available dimensions are:
GENES_DIMENSION
CONDITIONS_DIMENSION
clusteringDimensionOrder
- double[]
the order
of the genes or conditions, according to the dimension in
which the tree will be sortednodesOrder
- double[]
the order of the nodesnodesCounts
- int[]
the number of elements for
each nodetree
- Node[]
the set of nodes that make up the tree
java.lang.Exception
public ClusterTreeNode[] hierarchicalClusteringWithTreeOutput(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, boolean clusteringDimension, char distanceMetric, char clusteringMethod, double[][] distanceMatrix) throws java.lang.ArrayIndexOutOfBoundsException, java.lang.Exception
clusteringMethod
, on a given set of gene expression
data, using the distance metric given by distanceMetric
.
If successful, the function returns a pointer to a newly allocated
array of Node
s containing the hierarchical clustering
solution.
expressionData
- double[][]
the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask
- int[][]
shows which expression
values are missing; if missingsMask[i][j]
== 0,
then expressionData[i][j]
is missing; otherwise,
if missingsMask[i][j]
== 1, then
expressionData[i][j]
is validelementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedclusteringDimension
- boolean the clustering
dimension; available dimensions:
GENES_DIMENSION
- the genes/rows of
expressionData
are clustered
CONDITIONS_DIMENSION
- the conditions/columns
of expressionData
are clustered
distanceMetric
- char
the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringMethod
- char
the clustering
algorithm/method to use:
COMPLETE_LINKAGE
SINGLE_LINKAGE
AVERAGE_LINKAGE
CENTROID_LINKAGE
distanceMatrix
- double[][]
the distance matrix;
if the distance matrix is null initially, it will be
calculated from the data; if the given distance matrix is
not null, its contents will be modified as part of the
clustering algorithm
Node
objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public static double[][] computeDistanceMatrix(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, char distanceMetric, boolean clusteringDimension) throws java.lang.Exception
expressionData
- double[][]
the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask
- int[][]
shows which expression
values are missing; if missingsMask[i][j]
== 0,
then expressionData[i][j]
is missing; otherwise,
if missingsMask[i][j]
== 1, then
expressionData[i][j]
is validelementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculateddistanceMetric
- char
the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension
- boolean the clustering
dimension; available dimensions:
GENES_DIMENSION
- the genes/rows of
expressionData
are clustered
CONDITIONS_DIMENSION
- the conditions/columns
of expressionData
are clustered
java.lang.Exception
public ClusterTreeNode[] singleLinkageClustering(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, double[][] distanceMatrix, char distanceMetric, boolean clusteringDimension) throws java.lang.Exception
expressionData
- double[][]
the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask
- int[][]
shows which expression
values are missing; if missingsMask[i][j]
== 0,
then expressionData[i][j]
is missing; otherwise,
if missingsMask[i][j]
== 1, then
expressionData[i][j]
is validelementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculateddistanceMatrix
- double[][] the distance matrix; if the
distance matrix is not null, it is used to speed up the
clustering calculation and the original expression data
(specified by expressionData
and
missingsMask
) are not needed and, therefore,
ignored; the contents of the distance matrix
are not modified; if distanceMatrix
is null,
the pairwise distances are calculated from the gene expression
data (the expressionData
and
missingsMask
bidimensional arrays) and stored in
temporary arraysdistanceMetric
- char
the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension
- boolean the clustering
dimension; available dimensions:
GENES_DIMENSION
- the genes/rows of
expressionData
are clustered
CONDITIONS_DIMENSION
- the conditions/columns
of expressionData
are clustered
Node
objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.Exception
public ClusterTreeNode[] completeLinkageClustering(double[][] distanceMatrix) throws java.lang.ArrayIndexOutOfBoundsException, java.lang.Exception
distanceMatrix
- double[][] the distance matrix; the number of
rows (first dimension of this bidimensional array) is equal
to the number of elements to be clustered; this is a ragged
array containing the distances between the elements to be
clustered (genes/rows or conditions/columns); as the distance
matrix is symmetric, with zeros on the diagonal, only the
lower triangular half of the distance matrix is saved and
used; the distance matrix is modified by this method
Node
objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public ClusterTreeNode[] averageLinkageClustering(double[][] distanceMatrix) throws java.lang.ArrayIndexOutOfBoundsException, java.lang.Exception
distanceMatrix
- double[][] the distance matrix; the number of
rows (first dimension of this bidimensional array) is equal
to the number of elements to be clustered; this is a ragged
array containing the distances between the elements to be
clustered (genes/rows or conditions/columns); as the distance
matrix is symmetric, with zeros on the diagonal, only the
lower triangular half of the distance matrix is saved and
used; the distance matrix is modified by this method
Node
objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public ClusterTreeNode[] centroidLinkageClustering(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, double[][] distanceMatrix, char distanceMetric, boolean clusteringDimension) throws java.lang.ArrayIndexOutOfBoundsException, java.lang.Exception
expressionData
and missingsMask
), using the distance matrix given by
distanceMatrix
.
expressionData
- double[][]
the data that contains
the single dimension expression arrays (rows/genes or
columns/conditions) to be clustered; first dimension contains
genes/rows, second dimension contains conditions/columnsmissingsMask
- int[][]
shows which expression
values are missing; if missingsMask[i][j]
== 0,
then expressionData[i][j]
is missing; otherwise,
if missingsMask[i][j]
== 1, then
expressionData[i][j]
is validelementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculateddistanceMatrix
- double[][] the distance matrix; if the
distance matrix is not null, it is used to speed up the
clustering calculation and the original expression data
(specified by expressionData
and
missingsMask
) are not needed and, therefore,
ignored; the contents of the distance matrix
are not modified; if distanceMatrix
is null,
the pairwise distances are calculated from the gene expression
data (the expressionData
and
missingsMask
bidimensional arrays) and stored in
temporary arraysdistanceMetric
- char
the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension
- boolean the clustering
dimension; available dimensions:
GENES_DIMENSION
- the genes/rows of
expressionData
are clustered
CONDITIONS_DIMENSION
- the conditions/columns
of expressionData
are clustered
Node
objects, describing the
hierarchical clustering solution consisting of
(clusteredElements-1) nodes; depending on whether genes (rows)
or conditions (columns) were clustered, clusteredElements is
equal to the number of genes/rows or the number of
conditions/columns, respectively
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception
public static double computeDistance(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension, char distanceMetric) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
distanceMetric
- char
the distance measure to use;
valid distance metrics are:UNCENTERED_CORRELATION
PEARSON_CORRELATION
UNCENTERED_ABS_CORRELATION
PEARSON_ABS_CORRELATION
SPEARMAN_CORRELATION
KENDALL_TAU_CORRELATION
EUCLIDEAN_DISTANCE
CITYBLOCK_DISTANCE
for any other value, EUCLIDEAN_DISTANCE will be used
java.lang.Exception
public static double findClosestPair(int maxRowLimit, double[][] distanceMatrix, int[] ipointer, int[] jpointer) throws java.lang.Exception
ipointer
and jpointer
(should be acessible at ipointer[0] and jpointer[0] to be precise),
respectively, and the distance itself is the return value.
maxRowLimit
- int
the maximum number of elements
of the distance matrix where this method should look for the
pair with the shortest distance; this value should not exceed
the number of rows in the distanceMatrix
distanceMatrix
- double[][] the distance matrix; this is a ragged
array containing the distances between elements (genes/rows or
conditions/columns); as the distance matrix is symmetric,
with zeros on the diagonal, only the lower triangular half
of the distance matrix is saved and usedipointer
- int[]
an array with, at least, one
element to receive the first index of the pair with the
shortest distancejpointer
- int[]
an array with, at least, onde element
to receive the second index of the pair with the shortest distance
java.lang.Exception
public static double[] calculateWeights(double[][] expressionData, int[][] missingsMask, double[] originalWeights, boolean dimension, char distanceMetric, double cutOff, double exponent) throws java.lang.Exception
expressionData
- double[][]
the bidimensional
array that contains the expression datamissingsMask
- int[][]
the mask that indicates
which expression values are missing in expressionData
originalWeights
- double[]
the original weights
used to calculate the distancesdimension
- boolean
the dimension of the computation
(either GENES_DIMENSION
or
CONDITIONS_DIMENSION
)distanceMetric
- char
the distance metric used
to calculate distancescutOff
- double
the cutoff to use in weight calculationexponent
- double
the exponent to use in weight calculation
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |