Julia Handl, Joshua Knowles and Douglas B.Kell
Computational cluster validation in post-genomic data analysis
Bioinformatics 2005 21(15):3201-3212
[PDF][Web Site]
1.Compactness : k-means, average-link agglomerative clustering, SOMs, model-based clustering
2.Connectedness : density-based methods, single-link agglomerative clustering
3.Spatial separation : simulated annealing, tabu search, evolutionary algorithms
[A]External measures
1.Unary measures : F-measure, 'enrichment'
2.Binary measures : Rand Index, Jaccard coefficient, Minkowski Score
[B]Internal measures
1.Compactness : graph-based approaches
2.Connectedness : k-nearest neighbor consistency, connectivity,
3.Separation : average weighted inter-cluter distance
4.Combinations : SD-validity Index, Dunn Index, Dunn-like Indices, Davies-Bouldin Index, Silhouette Width
5.predictive power/stability
6.Compliance between a partitioning and distance information : Pearson correlation, Spearman rank correlation
7.Specialized measures for highly correlated data : figure of merit, jacknife approach, figure of merit of Yeung
1.人工データ:'Long', 'Square'
2.白血病データ [Golub]
2.Adjusted F-measure
3.Silhouette Width
4.Dunn Index
・目的「In particular, the paper attempts to familiarize researchers with some of the fundamental concepts behind cluster-validation techniques, and to assist them in making more informed choices of the measures to be used.」
・問題点「There are several valid properties that may be ascribed to a good partitioning, but these are partly in conflict and are generally difficult to express in terms of objective functions.」
・問題点「However, there is hardly any consensus on the best distance function, clustering method or method of feature selection to be used for the different types of post-genomic data.」
Computational cluster validation in post-genomic data analysis
Bioinformatics 2005 21(15):3201-3212
[PDF][Web Site]
1.Compactness : k-means, average-link agglomerative clustering, SOMs, model-based clustering
2.Connectedness : density-based methods, single-link agglomerative clustering
3.Spatial separation : simulated annealing, tabu search, evolutionary algorithms
[A]External measures
1.Unary measures : F-measure, 'enrichment'
2.Binary measures : Rand Index, Jaccard coefficient, Minkowski Score
[B]Internal measures
1.Compactness : graph-based approaches
2.Connectedness : k-nearest neighbor consistency, connectivity,
3.Separation : average weighted inter-cluter distance
4.Combinations : SD-validity Index, Dunn Index, Dunn-like Indices, Davies-Bouldin Index, Silhouette Width
5.predictive power/stability
6.Compliance between a partitioning and distance information : Pearson correlation, Spearman rank correlation
7.Specialized measures for highly correlated data : figure of merit, jacknife approach, figure of merit of Yeung
1.人工データ:'Long', 'Square'
2.白血病データ [Golub]
2.Adjusted F-measure
3.Silhouette Width
4.Dunn Index
・目的「In particular, the paper attempts to familiarize researchers with some of the fundamental concepts behind cluster-validation techniques, and to assist them in making more informed choices of the measures to be used.」
・問題点「There are several valid properties that may be ascribed to a good partitioning, but these are partly in conflict and are generally difficult to express in terms of objective functions.」
・問題点「However, there is hardly any consensus on the best distance function, clustering method or method of feature selection to be used for the different types of post-genomic data.」