ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Jaeger,2002,Improved gene selection for classi~

2008年01月24日 07時58分45秒 | 論文記録
J. Jaeger, R. Sengupta, W. Ruzzo
Improved Gene Selection For Classification Of Microarrays
Pacific Symposium on Biocomputing, Kauai, Hawaii, Jan., 2003.
[PDF][Web Site]

・遺伝子抽出法の提案。
・オリジナル法(遺伝子抽出)
1.Correlation
2.Clustering
3.Masked out Clustering
・比較法(遺伝子抽出)
1.Fisher
2.Golub
3.Park
4.TNoM
5.t-test
・データ
1. 40 Adenocarcinoma and 22 normal samples [Alon]
2. 47 ALL and 25 AML [Golub]
3. 18 tumor and 18 normal samples [Notterman]
・遺伝子抽出の評価法:SVM + LOOCV により ROC を算出

・問題点「A problem with this approach is that many of these genes are highly correlated.
・目的「Given a series of microarray experiments for a specific tissue under different conditions we want to find the genes most likely differentially expressed under these conditions. In other words, we want to find the genes that best explain the effects of these conditions.
・原理「In order to increase the classification performance we propose to use more uncorrelated genes instead of just the top genes.
・「If many genes are highly correlated we could describe this pathway with fewer genes and reach the same precision. Additionally, we could replace correlated genes from this pathway by genes from other pathways and possibly increase the prediction accuracy.
・"Correlation" 原理「A simple greedy algorithm accomplishes this selection ? the k-th gene selected is the gene with highest p-value among all genes whose correlation to each of the first k-1 is below the specified threshold.
・"Clustering","Masked out Clustering" 原理「If the cluster then has a bad quality we might pick a lot of genes from that cluster even though they are not informative. To counteract this problem we implemented the possibility to mask out and exclude clusters that have an average bad test statistic p-value
・結果「There is no clear winner between the three proposed methods and it depends largely on the dataset and parameters used.
・展望「It is pretty expensive to try all possible numbers for clusters to find a setting that provides us with a good LOOCV performance. One direction for future work would be to estimate the number of clusters using a BIC (Bayesian Information Criterion) score or switching over to model based clustering.

・提案法の原理の細かい所が読みとれず

コメント    この記事についてブログを書く
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする
« ▲閉店【食】魚活鮮玄挽うどん... | トップ | 職場の新年会とドカ雪 »

コメントを投稿

論文記録」カテゴリの最新記事