Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL.
Model-based clustering and data transformations for gene expression data.
Bioinformatics. 2001 Oct;17(10):977-87.
[PDF][Web Site]
・最適なクラスタリング結果を得るための手法について。
・データ:生データ×2、人工データ×3
Gene expression data sets : 1.Ovary data [Schummer], 2.Yeast cell cycle data [Cho]
Synthetic data sets : 1.Mixture of normal distributions based only the ovary data, 2.Randomly resampled ovary data, 3.Cyclic data
・クラスタリング法:
1.EI(equal volume spherical model)
2.VI(unequal volume spherical model)
3.VVV(unconstrained model)
4.EEE(elliptical model)
以上がModel-based clustering、以下は比較した従来法
5.Diagonal
6.CAST
・クラスタリング結果の評価法:1.Average adjusted Rand indices, 2.Average BIC scores
・問題点「Most of the proposed clustering algorithms are largely heuristically motivated, and the issues of determining the 'correct' number of clusters and choosing a 'good' clustering algorithm are not yet rigorously solved.」
・概要「Our contributions include demonstrations of the potential usefulness of the model-based approach by testing the Gaussian mixture assumption for different transformations of expression data, applying existing model-based clustering implementations to both real expression data and symthetic data sets, and comparing the performance of the model-based approach to a leading heuristic-based algorithm.」
・「One of the key advantages of the model-based approach is the availability of a variety of models that distinguish between these scenarios (and others).」
・「Another key advantage of model-based clustering is that there is a principled, data-driven way to approach the model selection problem,」
・結果「Our results show that the BIC analysis not only selects the right model, but also determines the correct number of clusters.」
・教訓「In particular, if the goal is to capture the general patterns across experiments without considering the absolute expression levels, data transformations such as standardization are helpful.」
・論文の構成がハッキリしていて読みやすい。とはいっても理論的な部分はいまいち理解してないのですが。
《チェック論文》
[1]Yeung, Haynor, Ruzzo: Validating Clustering for Gene Expression Data.,Bioinformatics, 2001 v17#4: 309-318. Nat Genet. 1999 Jul;22(3):281-5.
[2]Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM.,Systematic determination of genetic network architecture.,Nat Genet. 1999 Jul;22(3):213-5.
[3]Chris Fraley and Adrian E. Raftery.,How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis,Technical, February 1998. Computer Journal 41:578-588 (1998).
Model-based clustering and data transformations for gene expression data.
Bioinformatics. 2001 Oct;17(10):977-87.
[PDF][Web Site]
・最適なクラスタリング結果を得るための手法について。
・データ:生データ×2、人工データ×3
Gene expression data sets : 1.Ovary data [Schummer], 2.Yeast cell cycle data [Cho]
Synthetic data sets : 1.Mixture of normal distributions based only the ovary data, 2.Randomly resampled ovary data, 3.Cyclic data
・クラスタリング法:
1.EI(equal volume spherical model)
2.VI(unequal volume spherical model)
3.VVV(unconstrained model)
4.EEE(elliptical model)
以上がModel-based clustering、以下は比較した従来法
5.Diagonal
6.CAST
・クラスタリング結果の評価法:1.Average adjusted Rand indices, 2.Average BIC scores
・問題点「Most of the proposed clustering algorithms are largely heuristically motivated, and the issues of determining the 'correct' number of clusters and choosing a 'good' clustering algorithm are not yet rigorously solved.」
・概要「Our contributions include demonstrations of the potential usefulness of the model-based approach by testing the Gaussian mixture assumption for different transformations of expression data, applying existing model-based clustering implementations to both real expression data and symthetic data sets, and comparing the performance of the model-based approach to a leading heuristic-based algorithm.」
・「One of the key advantages of the model-based approach is the availability of a variety of models that distinguish between these scenarios (and others).」
・「Another key advantage of model-based clustering is that there is a principled, data-driven way to approach the model selection problem,」
・結果「Our results show that the BIC analysis not only selects the right model, but also determines the correct number of clusters.」
・教訓「In particular, if the goal is to capture the general patterns across experiments without considering the absolute expression levels, data transformations such as standardization are helpful.」
・論文の構成がハッキリしていて読みやすい。とはいっても理論的な部分はいまいち理解してないのですが。
《チェック論文》
[1]Yeung, Haynor, Ruzzo: Validating Clustering for Gene Expression Data.,Bioinformatics, 2001 v17#4: 309-318. Nat Genet. 1999 Jul;22(3):281-5.
[2]Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM.,Systematic determination of genetic network architecture.,Nat Genet. 1999 Jul;22(3):213-5.
[3]Chris Fraley and Adrian E. Raftery.,How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis,Technical, February 1998. Computer Journal 41:578-588 (1998).