biggests.clustering
Class AbstractMetricClustering

java.lang.Object
  extended by smadeira.utils.TimeConsumingTasks
      extended by biggests.clustering.AbstractMetricClustering
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
HierarchicalClustering

public abstract class AbstractMetricClustering
extends smadeira.utils.TimeConsumingTasks

Title: Abstract Metric Clustering

Description: Defines an abstract class for clustering methods based on distance metrics.

Copyright: Copyright (C) 2008 Joana P. Gonçalves This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See Also:
Serialized Form

Field Summary
static char CITYBLOCK_DISTANCE
          Cityblock distance (distance metric).
static boolean CONDITIONS_DIMENSION
          Clustering on conditions (columns) dimension (clustering dimension).
protected  int[] conditionsIndexes
          The indexes of the conditions set by clustering method for file output.
protected  java.lang.String[] conditionsNames
          Conditions' names.
protected  double[] conditionsOrder
          The order of the conditions in the data file.
protected  double[] conditionsWeights
          The weights of the conditions.
static char EUCLIDEAN_DISTANCE
          Euclidean distance (distance metric).
protected  double[][] expressionData
          Expression data.
static boolean GENES_DIMENSION
          Clustering on genes (rows) dimension (clustering dimension).
protected  int[] genesIndexes
          The indexes of the genes set by clustering method for file output.
protected  java.lang.String[] genesNames
          Genes' names.
protected  double[] genesOrder
          The order of the genes in the data file.
protected  java.lang.String[] genesUniqueIDs
          Genes' unique IDs.
protected  double[] genesWeights
          The weights of the genes.
static char KENDALL_TAU_CORRELATION
          Kendall's tau correlation (distance metric).
protected  int[][] missingsMask
          Missing values mask.
static char NO_CLUSTERING
          No clustering (distance metric).
protected  int numberOfConditions
          The number of conditions (microarrays) in the experiment.
protected  int numberOfGenes
          The number of genes in the experiment.
static char PEARSON_ABS_CORRELATION
          Pearson's absolute correlation (distance metric).
static char PEARSON_CORRELATION
          Pearson's correlation (distance metric).
static char SPEARMAN_RANK_CORRELATION
          Spearman's rank correlation (distance metric).
static char UNCENTERED_ABS_CORRELATION
          Uncentered absolute correlation (distance metric).
static char UNCENTERED_CORRELATION
          Uncentered correlation (distance metric).
protected  java.lang.String uniqueID
          The name of the unique ID used to identify genes in the data file.
 
Fields inherited from class smadeira.utils.TimeConsumingTasks
progressBar, serialVersionUID, taskPercentages
 
Constructor Summary
AbstractMetricClustering(smadeira.biclustering.IMatrix expressionMatrix)
          Creates a new AbstractClustering object to perform clustering over the set of gene expression data contained in expressionMatrix.
AbstractMetricClustering(int numberOfGenes, int numberOfConditions, double[] genesWeights, double[] conditionsWeights, double[] genesOrder, double[] conditionsOrder, int[] genesIndexes, int[] conditionsIndexes, java.lang.String uniqueID, java.lang.String[] genesUniqueIDs, java.lang.String[] genesNames, java.lang.String[] conditionsNames, double[][] expressionData, int[][] missingsMask)
          Creates a new AbstractMetricClustering object to perform clustering over the given set of expression data.
 
Method Summary
static double cityblockDistance(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the weighted "City Block" distance between two genes/rows or conditions/columns in a matrix.
static double[] computeRanks(java.util.ArrayList<java.lang.Double> dataToRank)
          Calculates the ranks of the elements in the array dataToRank.
static double[] computeRanks(double[] dataToRank)
          Calculates the ranks of the elements in the array dataToRank.
static double euclideanDistance(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the weighted Euclidean distance between two genes/rows or two conditions/columns.
static double kendallsTauCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Computes Kendall's Tau distance between two rows or two columns.
static double pearsonsAbsoluteCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the absolute value of the correlation.
static double pearsonsCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the weighted Pearson distance between two genes/rows or conditions/columns in a matrix.
protected static void sortIndexes(double[] data, int[] indexes)
          Sets up an index table given the data, such that data[indexes[i]], with i = 0, ..., data.length is in increasing order.
static double spearmansRankCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the Spearman distance between two genes/rows or conditions/columns.
static double uncenteredAbsoluteCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the absolute value of the uncentered version of the Pearson correlation.
static double uncenteredCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension)
          Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the uncentered version of the Pearson correlation.
 
Methods inherited from class smadeira.utils.TimeConsumingTasks
decrementPercentDone, getPercentDone, getProgressBar, getTaskPercentages, incrementPercentDone, incrementPercentDone, incrementSubtaskPercentDone, setPercentDone, setProgressBar, setSubtaskValues, setTaskPercentages, updateBiclusterPercentDone, updatePercentDone
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GENES_DIMENSION

public static final boolean GENES_DIMENSION
Clustering on genes (rows) dimension (clustering dimension).

See Also:
Constant Field Values

CONDITIONS_DIMENSION

public static final boolean CONDITIONS_DIMENSION
Clustering on conditions (columns) dimension (clustering dimension).

See Also:
Constant Field Values

NO_CLUSTERING

public static final char NO_CLUSTERING
No clustering (distance metric).

See Also:
Constant Field Values

UNCENTERED_CORRELATION

public static final char UNCENTERED_CORRELATION
Uncentered correlation (distance metric).

See Also:
Constant Field Values

PEARSON_CORRELATION

public static final char PEARSON_CORRELATION
Pearson's correlation (distance metric).

See Also:
Constant Field Values

UNCENTERED_ABS_CORRELATION

public static final char UNCENTERED_ABS_CORRELATION
Uncentered absolute correlation (distance metric).

See Also:
Constant Field Values

PEARSON_ABS_CORRELATION

public static final char PEARSON_ABS_CORRELATION
Pearson's absolute correlation (distance metric).

See Also:
Constant Field Values

SPEARMAN_RANK_CORRELATION

public static final char SPEARMAN_RANK_CORRELATION
Spearman's rank correlation (distance metric).

See Also:
Constant Field Values

KENDALL_TAU_CORRELATION

public static final char KENDALL_TAU_CORRELATION
Kendall's tau correlation (distance metric).

See Also:
Constant Field Values

EUCLIDEAN_DISTANCE

public static final char EUCLIDEAN_DISTANCE
Euclidean distance (distance metric).

See Also:
Constant Field Values

CITYBLOCK_DISTANCE

public static final char CITYBLOCK_DISTANCE
Cityblock distance (distance metric).

See Also:
Constant Field Values

numberOfGenes

protected int numberOfGenes
The number of genes in the experiment.


numberOfConditions

protected int numberOfConditions
The number of conditions (microarrays) in the experiment.


genesWeights

protected double[] genesWeights
The weights of the genes.


conditionsWeights

protected double[] conditionsWeights
The weights of the conditions.


genesOrder

protected double[] genesOrder
The order of the genes in the data file.


conditionsOrder

protected double[] conditionsOrder
The order of the conditions in the data file.


genesIndexes

protected int[] genesIndexes
The indexes of the genes set by clustering method for file output.


conditionsIndexes

protected int[] conditionsIndexes
The indexes of the conditions set by clustering method for file output.


uniqueID

protected java.lang.String uniqueID
The name of the unique ID used to identify genes in the data file.


genesUniqueIDs

protected java.lang.String[] genesUniqueIDs
Genes' unique IDs.


genesNames

protected java.lang.String[] genesNames
Genes' names.


conditionsNames

protected java.lang.String[] conditionsNames
Conditions' names.


expressionData

protected double[][] expressionData
Expression data. Contains the expression values measured for the genes in the experimental conditions.


missingsMask

protected int[][] missingsMask
Missing values mask.
missingsMask[i][j] == 1 -> valid expression value in expressionData[i][j] missingsMask[i][j] == 0 -> missing expression value in expressionData[i][j] (this means that the value in expressionData[i][j] should not be used for calculations).
Mask is composed by integer values, not boolean, because these values are sometimes used in calculation.

Constructor Detail

AbstractMetricClustering

public AbstractMetricClustering(smadeira.biclustering.IMatrix expressionMatrix)
Creates a new AbstractClustering object to perform clustering over the set of gene expression data contained in expressionMatrix.

Parameters:
expressionMatrix - IMatrix

AbstractMetricClustering

public AbstractMetricClustering(int numberOfGenes,
                                int numberOfConditions,
                                double[] genesWeights,
                                double[] conditionsWeights,
                                double[] genesOrder,
                                double[] conditionsOrder,
                                int[] genesIndexes,
                                int[] conditionsIndexes,
                                java.lang.String uniqueID,
                                java.lang.String[] genesUniqueIDs,
                                java.lang.String[] genesNames,
                                java.lang.String[] conditionsNames,
                                double[][] expressionData,
                                int[][] missingsMask)
                         throws java.lang.Exception
Creates a new AbstractMetricClustering object to perform clustering over the given set of expression data.

Parameters:
numberOfGenes - int the number of genes in the experiment
numberOfConditions - int the number of conditions in the experiment
genesWeights - double[] the set of genes' weights
conditionsWeights - double[] the set of conditions' weights
genesOrder - double[] the order of the genes
conditionsOrder - double[] the order of the conditions
genesIndexes - int[] the set of genes' indexes
conditionsIndexes - int[] the set of conditions' indexes
uniqueID - String the unique ID type name
genesUniqueIDs - String[] the set of genes' unique IDs
genesNames - String[] the set of genes' names
conditionsNames - String[] the set of conditions' names
expressionData - double[][] the set of expression values measured for the genes in the experimental conditions
missingsMask - int[][] the mask which states if each element contained in expressionData is valid (1) or missing (0)
Throws:
java.lang.Exception - - if array and/or matrices dimensions do not match the numberOfGenes and/or numberOfConditions
Method Detail

euclideanDistance

public static double euclideanDistance(double[][] expressionData1,
                                       double[][] expressionData2,
                                       int[][] missingsMask1,
                                       int[][] missingsMask2,
                                       double[] elementsWeights,
                                       int index1,
                                       int index2,
                                       boolean dimension)
                                throws java.lang.Exception
Calculates the weighted Euclidean distance between two genes/rows or two conditions/columns. To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

cityblockDistance

public static double cityblockDistance(double[][] expressionData1,
                                       double[][] expressionData2,
                                       int[][] missingsMask1,
                                       int[][] missingsMask2,
                                       double[] elementsWeights,
                                       int index1,
                                       int index2,
                                       boolean dimension)
                                throws java.lang.Exception
Calculates the weighted "City Block" distance between two genes/rows or conditions/columns in a matrix. City Block distance is defined as the absolute value of X1-X2 plus the absolute value of Y1-Y2 plus..., which is equivalent to taking an "up and over" path.
To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

pearsonsCorrelation

public static double pearsonsCorrelation(double[][] expressionData1,
                                         double[][] expressionData2,
                                         int[][] missingsMask1,
                                         int[][] missingsMask2,
                                         double[] elementsWeights,
                                         int index1,
                                         int index2,
                                         boolean dimension)
                                  throws java.lang.Exception
Calculates the weighted Pearson distance between two genes/rows or conditions/columns in a matrix. We define the Pearson distance as one minus the Pearson correlation. This definition yields a semi-metric: d(a,b) >= 0, and d(a,b) = 0 iff a = b. but the triangular inequality d(a,b) + d(b,c) >= d(a,c) does not hold (e.g., choose b = a + c).
To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

pearsonsAbsoluteCorrelation

public static double pearsonsAbsoluteCorrelation(double[][] expressionData1,
                                                 double[][] expressionData2,
                                                 int[][] missingsMask1,
                                                 int[][] missingsMask2,
                                                 double[] elementsWeights,
                                                 int index1,
                                                 int index2,
                                                 boolean dimension)
                                          throws java.lang.Exception
Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the absolute value of the correlation. This definition yields a semi-metric: d(a,b) >= 0, and d(a,b) = 0 iff a = b. but the triangular inequality d(a,b) + d(b,c) >= d(a,c) does not hold (e.g., choose b = a + c).
To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

uncenteredCorrelation

public static double uncenteredCorrelation(double[][] expressionData1,
                                           double[][] expressionData2,
                                           int[][] missingsMask1,
                                           int[][] missingsMask2,
                                           double[] elementsWeights,
                                           int index1,
                                           int index2,
                                           boolean dimension)
                                    throws java.lang.Exception
Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the uncentered version of the Pearson correlation. In the uncentered Pearson correlation, a zero mean is used for both vectors even if the actual mean is nonzero. This definition yields a semi-metric: d(a,b) >= 0, and d(a,b) = 0 iff a = b. but the triangular inequality d(a,b) + d(b,c) >= d(a,c) does not hold (e.g., choose b = a + c).
To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

uncenteredAbsoluteCorrelation

public static double uncenteredAbsoluteCorrelation(double[][] expressionData1,
                                                   double[][] expressionData2,
                                                   int[][] missingsMask1,
                                                   int[][] missingsMask2,
                                                   double[] elementsWeights,
                                                   int index1,
                                                   int index2,
                                                   boolean dimension)
                                            throws java.lang.Exception
Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the absolute value of the uncentered version of the Pearson correlation. In the uncentered Pearson correlation, a zero mean is used for both vectors even if the actual mean is nonzero. This definition yields a semi-metric: d(a,b) >= 0, and d(a,b) = 0 iff a = b. but the triangular inequality d(a,b) + d(b,c) >= d(a,c) does not hold (e.g., choose b = a + c).
To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

spearmansRankCorrelation

public static double spearmansRankCorrelation(double[][] expressionData1,
                                              double[][] expressionData2,
                                              int[][] missingsMask1,
                                              int[][] missingsMask2,
                                              double[] elementsWeights,
                                              int index1,
                                              int index2,
                                              boolean dimension)
                                       throws java.lang.Exception
Calculates the Spearman distance between two genes/rows or conditions/columns. The Spearman distance is defined as one minus the Spearman rank correlation.
To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

kendallsTauCorrelation

public static double kendallsTauCorrelation(double[][] expressionData1,
                                            double[][] expressionData2,
                                            int[][] missingsMask1,
                                            int[][] missingsMask2,
                                            double[] elementsWeights,
                                            int index1,
                                            int index2,
                                            boolean dimension)
                                     throws java.lang.Exception
Computes Kendall's Tau distance between two rows or two columns. The Kendall distance is defined as one minus Kendall's tau. To compute the distance between two genes/rows or conditions/columns of the same expression data, just call this method with the same expression matrix for expressionData1 and expressionData2 and the corresponding mask in both missingsMask1 and missingsMask2.
expressionData1, expressionData2, missingsMask1 and missingsMask2 must have the same dimensions.

Parameters:
expressionData1 - double[][] the expression data matrix containing the first gene/row or condition/column
expressionData2 - double[][] the expression data matrix containing the second gene/row or condition/column
missingsMask1 - int[][] the mask that indicates which expression values are missing in expressionData1
missingsMask2 - int[][] the mask that indicates which expression values are missing in expressionData2
elementsWeights - double[] the weights of the elements to be clustered (genes/rows or conditions/columns); the length of this array is equal to the number of conditions/columns if the distances between genes/rows are calculated, or the number of genes/rows if the distances between conditions/columns are calculated
index1 - int the index of the first gene/row or condition/column in expressionData1 and missingsMask1
index2 - int the index of the first gene/row or condition/column in expressionData2 and missingsMask2
dimension - boolean the dimension in which distance is being measured; available dimensions:
  • GENES_DIMENSION - the distance is measured between two genes/rows
  • CONDITIONS_DIMENSION - the distance is measured between two conditions/columns
Returns:
double the computed distance
Throws:
java.lang.Exception - if the dimensions of the several arrays do not comply or if the indexes are out of range

computeRanks

public static double[] computeRanks(double[] dataToRank)
Calculates the ranks of the elements in the array dataToRank. Two elements with the same value get the same rank, equal to the average of the ranks had the elements different values.

Parameters:
dataToRank - double[] array that contains the elements to rank
Returns:
double[] the computed ranks for the given data

computeRanks

public static double[] computeRanks(java.util.ArrayList<java.lang.Double> dataToRank)
Calculates the ranks of the elements in the array dataToRank. Two elements with the same value get the same rank, equal to the average of the ranks had the elements different values.

Parameters:
dataToRank - ArrayList[] array that contains the elements to rank
Returns:
double[] the computed ranks for the given data
See Also:
computeRanks(double[] dataToRank)

sortIndexes

protected static void sortIndexes(double[] data,
                                  int[] indexes)
                           throws java.lang.Exception
Sets up an index table given the data, such that data[indexes[i]], with i = 0, ..., data.length is in increasing order. Sorting is done on the indexes. The array data remains unchanged.

Parameters:
data - double[] the data to be sorted
indexes - int[] the indexes of the data to be sorted
Throws:
java.lang.Exception - if data or indexes is null, or if the length of data and the length of indexes are not equal