Momiao Xiong, Xiangzhong Fang, and Jinying Zhao
Biomarker Identification by Feature Wrappers
Genome Research, 11(11):1878-1887.,2001
[
PDF]
・遺伝子抽出・サンプルクラス分けの "Feature Wrappers" で使用される三方法の性能比較。
・クラス分け法
1.Linear discriminant anarysis (Fisher's LDA)
2.Logistic regression (LR)
3.Support Vector Machines (SVMs)
・探索法
1.Sequential forward search (SFS) algorithms
2.Sequential forward floating search (SFFS) algorithms
・データ
1.Cancer colon [Alon]
2.Breast tumor [Hedenfalk]
3.Doxorubicin [Perou]
・問題点「
First, although the calculation of fold changes or t-test and F test can identify highly differentially expressed genes, the classification accuracy of identified biomarkers by these methods is, in general, not very high. Second, most scoring methods do not use classification accuracy to measure a gene's ability to discriminate tissue samples.」
・問題点「
Third, to improve accuracy, several authors (Moler et al. 2000; Chow et al. 2001) used a combination of genes in the top of the list of ranked genes as a composite classifier. However, a simple combination of highly ranked markers according to their scores or discrimination ability may not be efficient for classification.」
・目的「
The goal of this research was to use feature (gene) selection incorporated into pattern recognition as a general framework for biomarker identification and optimal classifier generation.」
・結果「
First, the classification accuracy of the optimal subsets of genes searched by SFFS algorithm is greater than or equal to that obtained by SFS algorithm. Second, the accuracy increased when sizes of subsets of selected genes increased and quickly reached 100% accuracy for the SFFS algorithm, but suddenly dropped to 50% when the size of selected subsets of genes was >60 (which is close to total sample size of 62).」
・結果「
Third, it is interesting to note that the classification accuracy of optimal subsets of genes with size 4 searched by SFFS algorithm is 100%.」
・概要「
In this paper, we formulated the problem of biomarker identification as feature selection incorporated into pattern recognition (i.e., we formulated it into an optimization problem). 」
・問題点「
Now the question is how many genes are required and which genes are selected to ensure the required classification accuracy.」
・"pattern recognition" の絡み具合がどうもいまいちよく分からず。