Christophe Ambroise and Geoffrey J. McLachlan
Selection bias in gene extraction on the basis of microarray gene-expression data
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6562-6
[PDF][Web Site]
・マイクロアレイデータを用いたサンプル識別時に発生する"Selection bias"の問題についての考察。複数の識別法を使い、それぞれの識別率(Error rate)を比較する。
・識別法
1.AE : Apparent error rate of the rule R
2.CV1IE : CV (leave-one-out) internal error
3.B.632+ : B.632+ which puts relatively more weight on the leave-one-out bootstrap error B1 [Efron and Tibshirani]
4.CV10E : CV 10-fold error
5.TE : Test error
・データ
1. Colon data, 62 tissue samples (40 tumors/22 normal tissues), 2000 human gene expressions, Affy. [Alon]
2. Leukemia, 72 tissue samples (47 ALL/25 AML), 7129 genes, Affy. [Golub]
・問題点「However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias.」
・概要「We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process.」
・LOOCVの特性「The leave-one-out CV errors is nearly unbiased, but it can be highly variable.」
・結論「Hence it seems that the selection method and the number of selected genes are more important than the classification method for constructing a reliable prediction rule.」
・Error rate がゼロにベッタリへばりつくのはよろしくない、って話?
Selection bias in gene extraction on the basis of microarray gene-expression data
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6562-6
[PDF][Web Site]
・マイクロアレイデータを用いたサンプル識別時に発生する"Selection bias"の問題についての考察。複数の識別法を使い、それぞれの識別率(Error rate)を比較する。
・識別法
1.AE : Apparent error rate of the rule R
2.CV1IE : CV (leave-one-out) internal error
3.B.632+ : B.632+ which puts relatively more weight on the leave-one-out bootstrap error B1 [Efron and Tibshirani]
4.CV10E : CV 10-fold error
5.TE : Test error
・データ
1. Colon data, 62 tissue samples (40 tumors/22 normal tissues), 2000 human gene expressions, Affy. [Alon]
2. Leukemia, 72 tissue samples (47 ALL/25 AML), 7129 genes, Affy. [Golub]
・問題点「However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias.」
・概要「We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process.」
・LOOCVの特性「The leave-one-out CV errors is nearly unbiased, but it can be highly variable.」
・結論「Hence it seems that the selection method and the number of selected genes are more important than the classification method for constructing a reliable prediction rule.」
・Error rate がゼロにベッタリへばりつくのはよろしくない、って話?