ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Furlanello,2003,Entropy-based gene ranking ~

2007年01月24日 20時08分57秒 | 論文記録
Cesare Furlanello, Maria Serafini, Stefano Merler, and Giuseppe Jurman
Entropy-based gene ranking without selection bias for the predictive classification of microarray data
BMC Bioinformatics. 2003; 4: 54.
[PDF][Web Site]

・新しい遺伝子ランキング法について。従来のRFEを改良したE-RFEの提案。計算に使用する遺伝子をEntropyの値によってふるいにかけ(or 次元圧縮?)計算量を減らし処理の高速化を計る。
・データ(二群)
1. Colon cancer, 2000 genes, 62 tissues (22 normal/40 tumor, Affy.) [Alon]
2. Lymphoma, 4026 genes, 96 samples (72 cancer/24 non cancer, cDNA) [Alizadeh]
3. Tumor vs. metastases, 16063 genes, 76 samples (64 primary adeno-carcinomas/12 metastatic adeno-carcinomas, Affy) [Ramaswamy]

・方法「We have developed the entropy-based recursive feature elimination (E-RFE) as a non-parametric procedure for gene ranking, which accelerates - without reducing accuracy - the standard recursive feature elimination (RFE) method for SVMs[6].
・方法「In our E-RFE method, we cautiously discard, according to the entropy of the weight distribution, several (possibly many) genes at each step to drive the weight distribution in a high entropy structure of few equally important variables
・人工データ1-1「We considered first the dataset f1000-5000, structured as follows: 100 samples described by 5000 features, in which 1000 of them are significant (i.e. generated by 1000 Gaussian distribution centered in 1 and -1, with standard deviation uniformly ranging between 1 and 5), and the remaining are uniform noise in the range [-2, 2].
・人工データ1-2「We set up a second data set of 100 samples described by 5000 uniform noise features in the range [-2, 2].
・「Surprisingly, when the procedure was applied to the same data after a label randomization, a very similar result was obtained without any class information
・「We have analyzed the results obtained by applying an optimal number of features (ONF) procedure designed to compute an approximate estimate of the optimal number of features n* for microarray data sets.
・人工データ2「We considered two synthetic data sets, each of 100 cases (50 labeled 1 and 50 labeled -1) described by 1000 features: the 1000 features in U1 were all uniformly distributed in the interval [-2, 2] and thus not discriminating the classes. The second data set U2 was derived from U1 by keeping unvaried 995 features and introducing 5 features normally distributed with mean 1 or -1 according to class, and variance 1.5.
・結論「Also considering the results of the experiments with no-information data, we may conclude that several promising results on microarray data may be descriptive of the shattering properties of classifiers on the given microarray data sets [18,16,17].
・注意「While attempting to reproduce results from other authors, we noticed the existence of a "preprocessing bias", also mentioned in [16].

・内容てんこ盛り&処理ステップが多く、さらっと読むだけでは内容がつかみきれない。
・"Selection bias"の概念がイマイチ理解できず。単純に、選択する遺伝子によって識別率が変わってくるという話?
~~~~~~~
・80本を超えたところでようやく"これだ!"という論文に突き当たりました。遅すぎ。英語の勉強も兼ねてフラフラのんびり読んでいたこともありますが。ともかく、この論文の内容の一部と今自分が考えている事が、その骨組みにおいてかなり近いものでした。国は全く違っても、人間、同じようなこと考えるんだなぁ、と変に感心しました。「先にやられた!」という不安感よりも、「今やっている内容は論文になるだけの価値がある」ことが確認できた安心感が大きいです。ようやく2003年レベルまで追いついた!? あとは2007年のレベルまで話を進めるだけ。
・今回から、可能な限り読むときは音読を心がけようと思います。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする