Andrew Y. Ng
On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples
Proc. 15th International Conf. on Machine Learning (1998)
[
PDF][
Web Site]
・マイクロアレイデータによる遺伝子抽出について、Wrapper modelに基づいたアルゴリズムの性能を評価する。設定をいろいろ変えた人工データを使って、識別率を出す。
・比較した方法:1.No feature selection, 2.STANDARD-WRAP, 3.ORDERED-FS
・問題点「
the main source of error in wrapper model feature selection is from "overfitting" hold-out or cross-validation data.」
・「
In view of these significant empirical successes, one central question is: What theoretical justification is there for feature selection?」
・「
another central question is: How does the performance of feature selection scale with the number of irrelevant features?」
・Filter modelとは「
The filter model relies on general charasteristics of the training data to select some feature subset, doing so without reference to the learning algorithm.」
・Wrapper modelとは「
In the wrapper model, one generates sets of candidate features, runs them through the learning algorithm, and uses the performance of the resulting hypothesis to evaluate the feature set.」
・データ作成「
Training examples were corrupted at a noise rate η = 0.3 and all input features were i.i.d. zero-mean unit variable normally distributed random variables.」
・理論的な話の部分がサッパリ通じない。
~~~~~~~
・そんなこんなで論文記録100本達成。特に目立った進歩も進展もなく。ここまで来る前には決着ついてると思ってたけど。。。200本までには決着ついていますように。