biggests.clustering
Class HierarchicalClustering

java.lang.Object
  extended by TimeConsumingTasks
      extended by biggests.clustering.AbstractMetricClustering
          extended by biggests.clustering.HierarchicalClustering

public class HierarchicalClustering
extends AbstractMetricClustering

Title: Hierarchical Clustering

Description: This class implements Hierarchical Clustering over expression data. Most of the methods are based in the C source code of Cluster 3.0 (version 1.36) by Michiel de Hoon, while at the Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, University of Tokyo, Japan.
Cluster 3.0 is an enhanced version of Cluster, which was originally developed by Michael Eisen while at Stanford University.

Copyright: Copyright (C) 2007 Joana Gonçalves This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. GNU General Public License also complies with the terms of Cluster/TreeView original license


Field Summary
static char AVERAGE_LINKAGE
          Pairwise average linkage (clustering method).
static char CENTROID_LINKAGE
          Pairwise centroid-linkage (clustering method).
static char COMPLETE_LINKAGE
          Pairwise complete-linkage (clustering method).
static char SINGLE_LINKAGE
          Pairwise single-linkage (clustering method).
 
Fields inherited from class biggests.clustering.AbstractMetricClustering
CITYBLOCK_DISTANCE, CONDITIONS_DIMENSION, conditionsIndexes, conditionsNames, conditionsOrder, conditionsWeights, EUCLIDEAN_DISTANCE, expressionData, GENES_DIMENSION, genesIndexes, genesNames, genesOrder, genesUniqueIDs, genesWeights, KENDALL_TAU_CORRELATION, missingsMask, NO_CLUSTERING, numberOfConditions, numberOfGenes, PEARSON_ABS_CORRELATION, PEARSON_CORRELATION, SPEARMAN_RANK_CORRELATION, UNCENTERED_ABS_CORRELATION, UNCENTERED_CORRELATION, uniqueID
 
Constructor Summary
HierarchicalClustering(IMatrix expressionMatrix)
          Creates a new HierarchicalClustering object to perform hierarchical clustering over the set of gene expression data contained in expressionMatrix.
HierarchicalClustering(int numberOfGenes, int numberOfConditions, double[] genesWeights, double[] conditionsWeights, double[] genesOrder, double[] conditionsOrder, int[] genesIndexes, int[] conditionsIndexes, java.lang.String uniqueID, java.lang.String[] genesUniqueIDs, java.lang.String[] genesNames, java.lang.String[] conditionsNames, double[][] expressionData, int[][] missingsMask)
          Creates a new HierarchicalClustering object to perform hierarchical clustering over the given set of expression data.
 
Method Summary
 ClusterTreeNode[] averageLinkageClustering(double[][] distanceMatrix)
          Performs clustering using pairwise average-linking on the given distance matrix.
static double[] calculateWeights(double[][] expressionData, int[][] missingsMask, double[] originalWeights, boolean dimension, char distanceMetric, double cutOff, double exponent)
          This function calculates the weights using the weighting scheme proposed by Michael Eisen: w[i] = 1.0 / sum_{j where d[i][j]
 ClusterTreeNode[] centroidLinkageClustering(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, double[][] distanceMatrix, char distanceMetric, boolean clusteringDimension)
          Performs clustering using pairwise centroid-linking on a given set of gene expression data (specified by expressionData and missingsMask), using the distance matrix given by distanceMatrix.
 ClusterTreeNode[] completeLinkageClustering(double[][] distanceMatrix)
          Performs clustering using pairwise complete-linkage on the given distance matrix.
static double computeDistance(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension, char distanceMetric)
          Calculates the distance between two genes/rows or two conditions/columns.
static double[][] computeDistanceMatrix(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, char distanceMetric, boolean clusteringDimension)
          Calculates the distance matrix between genes/rows or conditions/columns using their measured gene expression data.
static double findClosestPair(int maxRowLimit, double[][] distanceMatrix, int[] ipointer, int[] jpointer)
          Searches the distance matrix to find the pair with the shortest distance between them.
 void hierarchicalClusteringWithFileOutput(char genesMetric, char conditionsMetric, char clusteringMethod, java.lang.String jobName)
          Performs hierarchical clustering over this set of data and writes output information in corresponding files (.GTR, .ATR, .CDT).
 ClusterTreeNode[] hierarchicalClusteringWithTreeOutput(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, boolean clusteringDimension, char distanceMetric, char clusteringMethod, double[][] distanceMatrix)
          Performs hierarchical clustering using pairwise single-, maximum-, centroid-, or average-linkage, as defined by clusteringMethod, on a given set of gene expression data, using the distance metric given by distanceMetric.
 ClusterTreeNode[] singleLinkageClustering(double[][] expressionData, int[][] missingsMask, double[] elementsWeights, double[][] distanceMatrix, char distanceMetric, boolean clusteringDimension)
          Performs single-linkage hierarchical clustering, using either the distance matrix directly, if available, or by calculating the distances from the expression data matrix.
 void sortClusteringTree(boolean sortingDimension, double[] clusteringDimensionOrder, double[] nodesOrder, int[] nodesCounts, ClusterTreeNode[] tree)
          Sorts the nodes of the tree based on the tree's structure.
 
Methods inherited from class biggests.clustering.AbstractMetricClustering
cityblockDistance, computeRanks, computeRanks, euclideanDistance, kendallsTauCorrelation, pearsonsAbsoluteCorrelation, pearsonsCorrelation, sortIndexes, spearmansRankCorrelation, uncenteredAbsoluteCorrelation, uncenteredCorrelation
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMPLETE_LINKAGE

public static final char COMPLETE_LINKAGE
Pairwise complete-linkage (clustering method).

See Also:
Constant Field Values

SINGLE_LINKAGE

public static final char SINGLE_LINKAGE
Pairwise single-linkage (clustering method).

See Also:
Constant Field Values

CENTROID_LINKAGE

public static final char CENTROID_LINKAGE
Pairwise centroid-linkage (clustering method).

See Also:
Constant Field Values

AVERAGE_LINKAGE

public static final char AVERAGE_LINKAGE
Pairwise average linkage (clustering method).

See Also:
Constant Field Values
Constructor Detail

HierarchicalClustering

public HierarchicalClustering(IMatrix expressionMatrix)
Creates a new HierarchicalClustering object to perform hierarchical clustering over the set of gene expression data contained in expressionMatrix.

Parameters:
expressionMatrix - IMatrix

HierarchicalClustering

public HierarchicalClustering(int numberOfGenes,
                              int numberOfConditions,
                              double[] genesWeights,
                              double[] conditionsWeights,
                              double[] genesOrder,
                              double[] conditionsOrder,
                              int[] genesIndexes,
                              int[] conditionsIndexes,
                              java.lang.String uniqueID,
                              java.lang.String[] genesUniqueIDs,
                              java.lang.String[] genesNames,
                              java.lang.String[] conditionsNames,
                              double[][] expressionData,
                              int[][] missingsMask)
                       throws java.lang.Exception
Creates a new HierarchicalClustering object to perform hierarchical clustering over the given set of expression data.

Parameters:
numberOfGenes - int the number of genes in the experiment
numberOfConditions - int the number of conditions in the experiment
genesWeights - double[] the set of genes' weights
conditionsWeights - double[] the set of conditions' weights
genesOrder - double[] the order of the genes
conditionsOrder - double[] the order of the conditions
genesIndexes - int[] the set of genes' indexes
conditionsIndexes - int[] the set of conditions' indexes
uniqueID - String the unique ID type name
genesUniqueIDs - String[] the set of genes' unique IDs
genesNames - String[] the set of genes' names
conditionsNames - String[] the set of conditions' names
expressionData - double[][] the set of expression values measured for the genes in the experimental conditions
missingsMask - int[][] the mask which states if each element contained in expressionData is valid (1) or missing (0)
Throws:
java.lang.Exception - - if array and/or matrices dimensions do not match the numberOfGenes and/or numberOfConditions
Method Detail

hierarchicalClusteringWithFileOutput

public void hierarchicalClusteringWithFileOutput(char genesMetric,
                                                 char conditionsMetric,
                                                 char clusteringMethod,
                                                 java.lang.String jobName)
                                          throws java.io.IOException
Performs hierarchical clustering over this set of data and writes output information in corresponding files (.GTR, .ATR, .CDT).

Parameters:
genesMetric - char the distance metric to use in genes' clustering; available distance metrics are:
  • NO_CLUSTERING - with this option no clustering will be performed on the genes' dimension
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE
conditionsMetric - char the distance metric to use in conditions' clustering; available distance metrics are:
  • NO_CLUSTERING - with this option no clustering will be performed on the columns' dimension
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE
clusteringMethod - char the clustering method to apply; available clustering methods are:
  • COMPLETE_LINKAGE
  • SINGLE_LINKAGE
  • AVERAGE_LINKAGE
  • CENTROID_LINKAGE
jobName - String the name to appear in all output filenames, before the file extension
Throws:
java.io.IOException - - if files could not be opened for writing or some problem occured while writing to them

sortClusteringTree

public void sortClusteringTree(boolean sortingDimension,
                               double[] clusteringDimensionOrder,
                               double[] nodesOrder,
                               int[] nodesCounts,
                               ClusterTreeNode[] tree)
                        throws java.lang.Exception
Sorts the nodes of the tree based on the tree's structure.

Parameters:
sortingDimension - boolean states in which dimension clustering was performed, which is also the same dimension to perform the sorting operation; the available dimensions are:
  • GENES_DIMENSION
  • CONDITIONS_DIMENSION
clusteringDimensionOrder - double[] the order of the genes or conditions, according to the dimension in which the tree will be sorted
nodesOrder - double[] the order of the nodes
nodesCounts - int[] the number of elements for each node
tree - Node[] the set of nodes that make up the tree
Throws:
java.lang.Exception

hierarchicalClusteringWithTreeOutput

public ClusterTreeNode[] hierarchicalClusteringWithTreeOutput(double[][] expressionData,
                                                              int[][] missingsMask,
                                                              double[] elementsWeights,
                                                              boolean clusteringDimension,
                                                              char distanceMetric,
                                                              char clusteringMethod,
                                                              double[][] distanceMatrix)
                                                       throws java.lang.ArrayIndexOutOfBoundsException,
                                                              java.lang.Exception
Performs hierarchical clustering using pairwise single-, maximum-, centroid-, or average-linkage, as defined by clusteringMethod, on a given set of gene expression data, using the distance metric given by distanceMetric. If successful, the function returns a pointer to a newly allocated array of Nodes containing the hierarchical clustering solution.

Parameters:
expressionData - double[][] the data that contains the single dimension expression arrays (rows/genes or columns/conditions) to be clustered; first dimension contains genes/rows, second dimension contains conditions/columns
missingsMask - int[][] shows which expression values are missing; if missingsMask[i][j] == 0, then expressionData[i][j] is missing; otherwise, if missingsMask[i][j] == 1, then expressionData[i][j] is valid
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
clusteringDimension - boolean the clustering dimension; available dimensions:
  • GENES_DIMENSION - the genes/rows of expressionData are clustered
  • CONDITIONS_DIMENSION - the conditions/columns of expressionData are clustered
distanceMetric - char the distance measure to use; valid distance metrics are:
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE for any other value, EUCLIDEAN_DISTANCE will be used
clusteringMethod - char the clustering algorithm/method to use:
  • COMPLETE_LINKAGE
  • SINGLE_LINKAGE
  • AVERAGE_LINKAGE
  • CENTROID_LINKAGE
for the first three, either the distance matrix or the gene expression data is sufficient to perform the clustering algorithm; for pairwise centroid-linkage clustering, however, the gene expression data are always needed, even if the distance matrix itself is available
distanceMatrix - double[][] the distance matrix; if the distance matrix is null initially, it will be calculated from the data; if the given distance matrix is not null, its contents will be modified as part of the clustering algorithm
Returns:
Node[] the set of Node objects, describing the hierarchical clustering solution consisting of (clusteredElements-1) nodes; depending on whether genes (rows) or conditions (columns) were clustered, clusteredElements is equal to the number of genes/rows or the number of conditions/columns, respectively
Throws:
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception

computeDistanceMatrix

public static double[][] computeDistanceMatrix(double[][] expressionData,
                                               int[][] missingsMask,
                                               double[] elementsWeights,
                                               char distanceMetric,
                                               boolean clusteringDimension)
                                        throws java.lang.Exception
Calculates the distance matrix between genes/rows or conditions/columns using their measured gene expression data. Several distance measures can be used. Returns a ragged array containing the distances between the elements (genes/rows or conditions/columns). As the distance matrix is symmetric, with zeros on the diagonal, only the lower triangular half of the distance matrix is saved.

Parameters:
expressionData - double[][] the data that contains the single dimension expression arrays (rows/genes or columns/conditions) to be clustered; first dimension contains genes/rows, second dimension contains conditions/columns
missingsMask - int[][] shows which expression values are missing; if missingsMask[i][j] == 0, then expressionData[i][j] is missing; otherwise, if missingsMask[i][j] == 1, then expressionData[i][j] is valid
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
distanceMetric - char the distance measure to use; valid distance metrics are:
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension - boolean the clustering dimension; available dimensions:
  • GENES_DIMENSION - the genes/rows of expressionData are clustered
  • CONDITIONS_DIMENSION - the conditions/columns of expressionData are clustered
Returns:
double[][] the computed distance matrix
Throws:
java.lang.Exception

singleLinkageClustering

public ClusterTreeNode[] singleLinkageClustering(double[][] expressionData,
                                                 int[][] missingsMask,
                                                 double[] elementsWeights,
                                                 double[][] distanceMatrix,
                                                 char distanceMetric,
                                                 boolean clusteringDimension)
                                          throws java.lang.Exception
Performs single-linkage hierarchical clustering, using either the distance matrix directly, if available, or by calculating the distances from the expression data matrix. This implementation is based on the SLINK algorithm, described in: Sibson, R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1): 30-34. The output of this algorithm is identical to conventional single-linkage hierarchical clustering, but is much more memory-efficient and faster. Hence, it can be applied to large data sets, for which the conventional single-linkage algorithm fails due to lack of memory.

Parameters:
expressionData - double[][] the data that contains the single dimension expression arrays (rows/genes or columns/conditions) to be clustered; first dimension contains genes/rows, second dimension contains conditions/columns
missingsMask - int[][] shows which expression values are missing; if missingsMask[i][j] == 0, then expressionData[i][j] is missing; otherwise, if missingsMask[i][j] == 1, then expressionData[i][j] is valid
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
distanceMatrix - double[][] the distance matrix; if the distance matrix is not null, it is used to speed up the clustering calculation and the original expression data (specified by expressionData and missingsMask) are not needed and, therefore, ignored; the contents of the distance matrix are not modified; if distanceMatrix is null, the pairwise distances are calculated from the gene expression data (the expressionData and missingsMask bidimensional arrays) and stored in temporary arrays
distanceMetric - char the distance measure to use; valid distance metrics are:
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension - boolean the clustering dimension; available dimensions:
  • GENES_DIMENSION - the genes/rows of expressionData are clustered
  • CONDITIONS_DIMENSION - the conditions/columns of expressionData are clustered
Returns:
Node[] the set of Node objects, describing the hierarchical clustering solution consisting of (clusteredElements-1) nodes; depending on whether genes (rows) or conditions (columns) were clustered, clusteredElements is equal to the number of genes/rows or the number of conditions/columns, respectively
Throws:
java.lang.Exception

completeLinkageClustering

public ClusterTreeNode[] completeLinkageClustering(double[][] distanceMatrix)
                                            throws java.lang.ArrayIndexOutOfBoundsException,
                                                   java.lang.Exception
Performs clustering using pairwise complete-linkage on the given distance matrix.

Parameters:
distanceMatrix - double[][] the distance matrix; the number of rows (first dimension of this bidimensional array) is equal to the number of elements to be clustered; this is a ragged array containing the distances between the elements to be clustered (genes/rows or conditions/columns); as the distance matrix is symmetric, with zeros on the diagonal, only the lower triangular half of the distance matrix is saved and used; the distance matrix is modified by this method
Returns:
Node[] the set of Node objects, describing the hierarchical clustering solution consisting of (clusteredElements-1) nodes; depending on whether genes (rows) or conditions (columns) were clustered, clusteredElements is equal to the number of genes/rows or the number of conditions/columns, respectively
Throws:
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception

averageLinkageClustering

public ClusterTreeNode[] averageLinkageClustering(double[][] distanceMatrix)
                                           throws java.lang.ArrayIndexOutOfBoundsException,
                                                  java.lang.Exception
Performs clustering using pairwise average-linking on the given distance matrix.

Parameters:
distanceMatrix - double[][] the distance matrix; the number of rows (first dimension of this bidimensional array) is equal to the number of elements to be clustered; this is a ragged array containing the distances between the elements to be clustered (genes/rows or conditions/columns); as the distance matrix is symmetric, with zeros on the diagonal, only the lower triangular half of the distance matrix is saved and used; the distance matrix is modified by this method
Returns:
Node[] the set of Node objects, describing the hierarchical clustering solution consisting of (clusteredElements-1) nodes; depending on whether genes (rows) or conditions (columns) were clustered, clusteredElements is equal to the number of genes/rows or the number of conditions/columns, respectively
Throws:
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception

centroidLinkageClustering

public ClusterTreeNode[] centroidLinkageClustering(double[][] expressionData,
                                                   int[][] missingsMask,
                                                   double[] elementsWeights,
                                                   double[][] distanceMatrix,
                                                   char distanceMetric,
                                                   boolean clusteringDimension)
                                            throws java.lang.ArrayIndexOutOfBoundsException,
                                                   java.lang.Exception
Performs clustering using pairwise centroid-linking on a given set of gene expression data (specified by expressionData and missingsMask), using the distance matrix given by distanceMatrix.

Parameters:
expressionData - double[][] the data that contains the single dimension expression arrays (rows/genes or columns/conditions) to be clustered; first dimension contains genes/rows, second dimension contains conditions/columns
missingsMask - int[][] shows which expression values are missing; if missingsMask[i][j] == 0, then expressionData[i][j] is missing; otherwise, if missingsMask[i][j] == 1, then expressionData[i][j] is valid
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
distanceMatrix - double[][] the distance matrix; if the distance matrix is not null, it is used to speed up the clustering calculation and the original expression data (specified by expressionData and missingsMask) are not needed and, therefore, ignored; the contents of the distance matrix are not modified; if distanceMatrix is null, the pairwise distances are calculated from the gene expression data (the expressionData and missingsMask bidimensional arrays) and stored in temporary arrays
distanceMetric - char the distance measure to use; valid distance metrics are:
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE for any other value, EUCLIDEAN_DISTANCE will be used
clusteringDimension - boolean the clustering dimension; available dimensions:
  • GENES_DIMENSION - the genes/rows of expressionData are clustered
  • CONDITIONS_DIMENSION - the conditions/columns of expressionData are clustered
Returns:
Node[] the set of Node objects, describing the hierarchical clustering solution consisting of (clusteredElements-1) nodes; depending on whether genes (rows) or conditions (columns) were clustered, clusteredElements is equal to the number of genes/rows or the number of conditions/columns, respectively
Throws:
java.lang.ArrayIndexOutOfBoundsException
java.lang.Exception

computeDistance

public static double computeDistance(double[][] expressionData1,
                                     double[][] expressionData2,
                                     int[][] missingsMask1,
                                     int[][] missingsMask2,
                                     double[] elementsWeights,
                                     int index1,
                                     int index2,
                                     boolean dimension,
                                     char distanceMetric)
                              throws java.lang.Exception
Calculates the distance between two genes/rows or two conditions/columns. To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
distanceMetric - char the distance measure to use; valid distance metrics are:
  • UNCENTERED_CORRELATION
  • PEARSON_CORRELATION
  • UNCENTERED_ABS_CORRELATION
  • PEARSON_ABS_CORRELATION
  • SPEARMAN_CORRELATION
  • KENDALL_TAU_CORRELATION
  • EUCLIDEAN_DISTANCE
  • CITYBLOCK_DISTANCE for any other value, EUCLIDEAN_DISTANCE will be used
Returns:
double the computed distance
Throws:
java.lang.Exception

findClosestPair

public static double findClosestPair(int maxRowLimit,
                                     double[][] distanceMatrix,
                                     int[] ipointer,
                                     int[] jpointer)
                              throws java.lang.Exception
Searches the distance matrix to find the pair with the shortest distance between them. The row and column indexes of the pair are returned in ipointer and jpointer (should be acessible at ipointer[0] and jpointer[0] to be precise), respectively, and the distance itself is the return value.

Parameters:
maxRowLimit - int the maximum number of elements of the distance matrix where this method should look for the pair with the shortest distance; this value should not exceed the number of rows in the distanceMatrix
distanceMatrix - double[][] the distance matrix; this is a ragged array containing the distances between elements (genes/rows or conditions/columns); as the distance matrix is symmetric, with zeros on the diagonal, only the lower triangular half of the distance matrix is saved and used
ipointer - int[] an array with, at least, one element to receive the first index of the pair with the shortest distance
jpointer - int[] an array with, at least, onde element to receive the second index of the pair with the shortest distance
Returns:
double the shortest distance between two elements
Throws:
java.lang.Exception

calculateWeights

public static double[] calculateWeights(double[][] expressionData,
                                        int[][] missingsMask,
                                        double[] originalWeights,
                                        boolean dimension,
                                        char distanceMetric,
                                        double cutOff,
                                        double exponent)
                                 throws java.lang.Exception
This function calculates the weights using the weighting scheme proposed by Michael Eisen: w[i] = 1.0 / sum_{j where d[i][j]
Parameters:
expressionData - double[][] the bidimensional array that contains the expression data
missingsMask - int[][] the mask that indicates which expression values are missing in expressionData
originalWeights - double[] the original weights used to calculate the distances
dimension - boolean the dimension of the computation (either GENES_DIMENSION or CONDITIONS_DIMENSION)
distanceMetric - char the distance metric used to calculate distances
cutOff - double the cutoff to use in weight calculation
exponent - double the exponent to use in weight calculation
Returns:
double[] the computed genes or conditions weights
Throws:
java.lang.Exception