|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectTimeConsumingTasks
biggests.clustering.AbstractMetricClustering
public abstract class AbstractMetricClustering
Title: Abstract Metric Clustering
Description: Defines an abstract class for clustering methods based on distance metrics.
Copyright: Copyright (C) 2008 Joana P. Gonçalves This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Field Summary | |
---|---|
static char |
CITYBLOCK_DISTANCE
Cityblock distance (distance metric). |
static boolean |
CONDITIONS_DIMENSION
Clustering on conditions (columns) dimension (clustering dimension). |
protected int[] |
conditionsIndexes
The indexes of the conditions set by clustering method for file output. |
protected java.lang.String[] |
conditionsNames
Conditions' names. |
protected double[] |
conditionsOrder
The order of the conditions in the data file. |
protected double[] |
conditionsWeights
The weights of the conditions. |
static char |
EUCLIDEAN_DISTANCE
Euclidean distance (distance metric). |
protected double[][] |
expressionData
Expression data. |
static boolean |
GENES_DIMENSION
Clustering on genes (rows) dimension (clustering dimension). |
protected int[] |
genesIndexes
The indexes of the genes set by clustering method for file output. |
protected java.lang.String[] |
genesNames
Genes' names. |
protected double[] |
genesOrder
The order of the genes in the data file. |
protected java.lang.String[] |
genesUniqueIDs
Genes' unique IDs. |
protected double[] |
genesWeights
The weights of the genes. |
static char |
KENDALL_TAU_CORRELATION
Kendall's tau correlation (distance metric). |
protected int[][] |
missingsMask
Missing values mask. |
static char |
NO_CLUSTERING
No clustering (distance metric). |
protected int |
numberOfConditions
The number of conditions (microarrays) in the experiment. |
protected int |
numberOfGenes
The number of genes in the experiment. |
static char |
PEARSON_ABS_CORRELATION
Pearson's absolute correlation (distance metric). |
static char |
PEARSON_CORRELATION
Pearson's correlation (distance metric). |
static char |
SPEARMAN_RANK_CORRELATION
Spearman's rank correlation (distance metric). |
static char |
UNCENTERED_ABS_CORRELATION
Uncentered absolute correlation (distance metric). |
static char |
UNCENTERED_CORRELATION
Uncentered correlation (distance metric). |
protected java.lang.String |
uniqueID
The name of the unique ID used to identify genes in the data file. |
Constructor Summary | |
---|---|
AbstractMetricClustering(IMatrix expressionMatrix)
Creates a new AbstractClustering object to
perform clustering over the set of gene expression
data contained in expressionMatrix . |
|
AbstractMetricClustering(int numberOfGenes,
int numberOfConditions,
double[] genesWeights,
double[] conditionsWeights,
double[] genesOrder,
double[] conditionsOrder,
int[] genesIndexes,
int[] conditionsIndexes,
java.lang.String uniqueID,
java.lang.String[] genesUniqueIDs,
java.lang.String[] genesNames,
java.lang.String[] conditionsNames,
double[][] expressionData,
int[][] missingsMask)
Creates a new AbstractMetricClustering object to
perform clustering over the given set of
expression data. |
Method Summary | |
---|---|
static double |
cityblockDistance(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the weighted "City Block" distance between two genes/rows or conditions/columns in a matrix. |
static double[] |
computeRanks(java.util.ArrayList<java.lang.Double> dataToRank)
Calculates the ranks of the elements in the array dataToRank . |
static double[] |
computeRanks(double[] dataToRank)
Calculates the ranks of the elements in the array dataToRank . |
static double |
euclideanDistance(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the weighted Euclidean distance between two genes/rows or two conditions/columns. |
static double |
kendallsTauCorrelation(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Computes Kendall's Tau distance between two rows or two columns. |
static double |
pearsonsAbsoluteCorrelation(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the absolute value of the correlation. |
static double |
pearsonsCorrelation(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the weighted Pearson distance between two genes/rows or conditions/columns in a matrix. |
protected static void |
sortIndexes(double[] data,
int[] indexes)
Sets up an index table given the data, such that data[indexes[i]], with i = 0, ..., data.length is in increasing order. |
static double |
spearmansRankCorrelation(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the Spearman distance between two genes/rows or conditions/columns. |
static double |
uncenteredAbsoluteCorrelation(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the absolute value of the uncentered version of the Pearson correlation. |
static double |
uncenteredCorrelation(double[][] expressionData1,
double[][] expressionData2,
int[][] missingsMask1,
int[][] missingsMask2,
double[] elementsWeights,
int index1,
int index2,
boolean dimension)
Calculates the weighted Pearson distance between two genes/rows or conditions/columns, using the uncentered version of the Pearson correlation. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final boolean GENES_DIMENSION
public static final boolean CONDITIONS_DIMENSION
public static final char NO_CLUSTERING
public static final char UNCENTERED_CORRELATION
public static final char PEARSON_CORRELATION
public static final char UNCENTERED_ABS_CORRELATION
public static final char PEARSON_ABS_CORRELATION
public static final char SPEARMAN_RANK_CORRELATION
public static final char KENDALL_TAU_CORRELATION
public static final char EUCLIDEAN_DISTANCE
public static final char CITYBLOCK_DISTANCE
protected int numberOfGenes
protected int numberOfConditions
protected double[] genesWeights
protected double[] conditionsWeights
protected double[] genesOrder
protected double[] conditionsOrder
protected int[] genesIndexes
protected int[] conditionsIndexes
protected java.lang.String uniqueID
protected java.lang.String[] genesUniqueIDs
protected java.lang.String[] genesNames
protected java.lang.String[] conditionsNames
protected double[][] expressionData
protected int[][] missingsMask
missingsMask[i][j] == 1
-> valid expression value in
expressionData[i][j]
missingsMask[i][j] == 0
-> missing expression value in
expressionData[i][j]
(this means that the value in
expressionData[i][j]
should not be used for calculations).
Constructor Detail |
---|
public AbstractMetricClustering(IMatrix expressionMatrix)
AbstractClustering
object to
perform clustering over the set of gene expression
data contained in expressionMatrix
.
expressionMatrix
- IMatrix
public AbstractMetricClustering(int numberOfGenes, int numberOfConditions, double[] genesWeights, double[] conditionsWeights, double[] genesOrder, double[] conditionsOrder, int[] genesIndexes, int[] conditionsIndexes, java.lang.String uniqueID, java.lang.String[] genesUniqueIDs, java.lang.String[] genesNames, java.lang.String[] conditionsNames, double[][] expressionData, int[][] missingsMask) throws java.lang.Exception
AbstractMetricClustering
object to
perform clustering over the given set of
expression data.
numberOfGenes
- int
the number of genes in the
experimentnumberOfConditions
- int
the number of conditions
in the experimentgenesWeights
- double[]
the set of genes' weightsconditionsWeights
- double[]
the set of conditions'
weightsgenesOrder
- double[]
the order of the genesconditionsOrder
- double[]
the order of the conditionsgenesIndexes
- int[]
the set of genes' indexesconditionsIndexes
- int[]
the set of conditions'
indexesuniqueID
- String
the unique ID type namegenesUniqueIDs
- String[]
the set of genes' unique IDsgenesNames
- String[]
the set of genes' namesconditionsNames
- String[]
the set of conditions' namesexpressionData
- double[][]
the set of expression
values measured for the genes in the experimental conditionsmissingsMask
- int[][]
the mask which states if each
element contained in expressionData
is valid (1) or
missing (0)
java.lang.Exception
- - if array and/or matrices dimensions do not match
the numberOfGenes
and/or
numberOfConditions
Method Detail |
---|
public static double euclideanDistance(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double cityblockDistance(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double pearsonsCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double pearsonsAbsoluteCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double uncenteredCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double uncenteredAbsoluteCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double spearmansRankCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double kendallsTauCorrelation(double[][] expressionData1, double[][] expressionData2, int[][] missingsMask1, int[][] missingsMask2, double[] elementsWeights, int index1, int index2, boolean dimension) throws java.lang.Exception
expressionData1
and expressionData2
and the corresponding mask in both missingsMask1
and missingsMask2
.expressionData1
, expressionData2
,
missingsMask1
and missingsMask2
must
have the same dimensions.
expressionData1
- double[][]
the expression
data matrix containing the first gene/row or condition/columnexpressionData2
- double[][]
the expression
data matrix containing the second gene/row or condition/columnmissingsMask1
- int[][]
the mask that indicates
which expression values are missing in expressionData1
missingsMask2
- int[][]
the mask that indicates
which expression values are missing in expressionData2
elementsWeights
- double[]
the weights
of the elements to be clustered (genes/rows or
conditions/columns); the length of this array is equal to
the number of conditions/columns if the distances between
genes/rows are calculated, or the number of genes/rows if the
distances between conditions/columns are calculatedindex1
- int
the index of the first gene/row or
condition/column in expressionData1
and
missingsMask1
index2
- int
the index of the first gene/row or
condition/column in expressionData2
and
missingsMask2
dimension
- boolean the dimension in which
distance is being measured; available dimensions:
GENES_DIMENSION
- the distance is measured
between two genes/rows
CONDITIONS_DIMENSION
- the distance is
measured between two conditions/columns
java.lang.Exception
- if the dimensions of the several arrays do not
comply or if the indexes are out of rangepublic static double[] computeRanks(double[] dataToRank)
dataToRank
. Two elements with the same
value get the same rank, equal to the average of the
ranks had the elements different values.
dataToRank
- double[]
array that
contains the elements to rank
double[]
the computed ranks for
the given datapublic static double[] computeRanks(java.util.ArrayList<java.lang.Double> dataToRank)
dataToRank
. Two elements with the same
value get the same rank, equal to the average of the
ranks had the elements different values.
dataToRank
- ArrayList[]
array that contains the elements to rank
double[]
the computed ranks for
the given datacomputeRanks(double[] dataToRank)
protected static void sortIndexes(double[] data, int[] indexes) throws java.lang.Exception
data
- double[]
the data to be sortedindexes
- int[]
the indexes of the data to be sorted
java.lang.Exception
- if data
or indexes
is
null, or if the length of data
and the length
of indexes
are not equal
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |