Stuart G Baker and Barnett S Kramer
Identifying genes that contribute most to good classification in microarrays
BMC Bioinformatics 2006, 7:407
[PDF]
・病気判別のために抽出した遺伝子を正当に評価する手法として、Multiple random validation を提案する。従来の N-fold cross validation のように決まった比率でサンプルを training と test に分けるのではなく、サンプルをランダムに分割し、試行を重ね平均を取る。
・データ
1.Colon cancer [Alon]
2.Leukemia [Golub]
3.Medulloblastoma [Pomeroy]
4.Breast cancer [West]
・判別法:Filter with a nearest centroid rule
・判別の評価法:Receiver operating characteristic (ROC) curves、Estimated area under the ROC curve (AUC)
・問題点「The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. 」
・方法「In multiple random validation, the data are randomly split into training and test samples many times. Unlike cross-validation, the goal is not to average performance over test samples but to investigate the variability of performance over test samples and the frequencies of genes selected on random splits [2].」
・目的「Therefore our goal was to identify classification rules that perform well with the fewest genes, and so may be more "robust" than rules with more genes.」
・将来の展望「Future research using a wrapper would be of great interest because of the potential of the wrapper to identify genes that have good classification when considered together but poor classification when considered separately.」
Identifying genes that contribute most to good classification in microarrays
BMC Bioinformatics 2006, 7:407
[PDF]
・病気判別のために抽出した遺伝子を正当に評価する手法として、Multiple random validation を提案する。従来の N-fold cross validation のように決まった比率でサンプルを training と test に分けるのではなく、サンプルをランダムに分割し、試行を重ね平均を取る。
・データ
1.Colon cancer [Alon]
2.Leukemia [Golub]
3.Medulloblastoma [Pomeroy]
4.Breast cancer [West]
・判別法:Filter with a nearest centroid rule
・判別の評価法:Receiver operating characteristic (ROC) curves、Estimated area under the ROC curve (AUC)
・問題点「The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. 」
・方法「In multiple random validation, the data are randomly split into training and test samples many times. Unlike cross-validation, the goal is not to average performance over test samples but to investigate the variability of performance over test samples and the frequencies of genes selected on random splits [2].」
・目的「Therefore our goal was to identify classification rules that perform well with the fewest genes, and so may be more "robust" than rules with more genes.」
・将来の展望「Future research using a wrapper would be of great interest because of the potential of the wrapper to identify genes that have good classification when considered together but poor classification when considered separately.」
※コメント投稿者のブログIDはブログ作成者のみに通知されます