Julia Handl, Joshua Knowles and Douglas B.Kell
Computational cluster validation in post-genomic data analysis
Bioinformatics 2005 21(15):3201-3212
[PDF][Web Site]
・マイクロアレイデータ解析に関する様々な解析方法のうち、どの方法を選択するかについて、多くの例を用いその特性や差異について述べ、指針を示す。
<クラス分け法の分類>
1.Compactness : k-means, average-link agglomerative clustering, SOMs, model-based clustering
2.Connectedness : density-based methods, single-link agglomerative clustering
3.Spatial separation : simulated annealing, tabu search, evolutionary algorithms
<クラス分け結果の評価法の分類>
[A]External measures
1.Unary measures : F-measure, 'enrichment'
2.Binary measures : Rand Index, Jaccard coefficient, Minkowski Score
[B]Internal measures
1.Compactness : graph-based approaches
2.Connectedness : k-nearest neighbor consistency, connectivity,
3.Separation : average weighted inter-cluter distance
4.Combinations : SD-validity Index, Dunn Index, Dunn-like Indices, Davies-Bouldin Index, Silhouette Width
5.predictive power/stability
6.Compliance between a partitioning and distance information : Pearson correlation, Spearman rank correlation
7.Specialized measures for highly correlated data : figure of merit, jacknife approach, figure of merit of Yeung
・データ
1.人工データ:'Long', 'Square'
2.白血病データ [Golub]
・クラス分け法
1.K-means
2.Average-link
3.Single-link
4.SOM
5.SOTA
・クラス分け評価法(縦軸)
1.F-measure
2.Adjusted F-measure
3.Silhouette Width
4.Dunn Index
5.Variance
6.Connectivity
7.Stability
・目的「In particular, the paper attempts to familiarize researchers with some of the fundamental concepts behind cluster-validation techniques, and to assist them in making more informed choices of the measures to be used.」
・問題点「There are several valid properties that may be ascribed to a good partitioning, but these are partly in conflict and are generally difficult to express in terms of objective functions.」
・問題点「However, there is hardly any consensus on the best distance function, clustering method or method of feature selection to be used for the different types of post-genomic data.」
Computational cluster validation in post-genomic data analysis
Bioinformatics 2005 21(15):3201-3212
[PDF][Web Site]
・マイクロアレイデータ解析に関する様々な解析方法のうち、どの方法を選択するかについて、多くの例を用いその特性や差異について述べ、指針を示す。
<クラス分け法の分類>
1.Compactness : k-means, average-link agglomerative clustering, SOMs, model-based clustering
2.Connectedness : density-based methods, single-link agglomerative clustering
3.Spatial separation : simulated annealing, tabu search, evolutionary algorithms
<クラス分け結果の評価法の分類>
[A]External measures
1.Unary measures : F-measure, 'enrichment'
2.Binary measures : Rand Index, Jaccard coefficient, Minkowski Score
[B]Internal measures
1.Compactness : graph-based approaches
2.Connectedness : k-nearest neighbor consistency, connectivity,
3.Separation : average weighted inter-cluter distance
4.Combinations : SD-validity Index, Dunn Index, Dunn-like Indices, Davies-Bouldin Index, Silhouette Width
5.predictive power/stability
6.Compliance between a partitioning and distance information : Pearson correlation, Spearman rank correlation
7.Specialized measures for highly correlated data : figure of merit, jacknife approach, figure of merit of Yeung
・データ
1.人工データ:'Long', 'Square'
2.白血病データ [Golub]
・クラス分け法
1.K-means
2.Average-link
3.Single-link
4.SOM
5.SOTA
・クラス分け評価法(縦軸)
1.F-measure
2.Adjusted F-measure
3.Silhouette Width
4.Dunn Index
5.Variance
6.Connectivity
7.Stability
・目的「In particular, the paper attempts to familiarize researchers with some of the fundamental concepts behind cluster-validation techniques, and to assist them in making more informed choices of the measures to be used.」
・問題点「There are several valid properties that may be ascribed to a good partitioning, but these are partly in conflict and are generally difficult to express in terms of objective functions.」
・問題点「However, there is hardly any consensus on the best distance function, clustering method or method of feature selection to be used for the different types of post-genomic data.」