ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Niijima,2006,Recursive gene selection based on~

2008年03月28日 08時06分27秒 | 論文記録
Satoshi Niijima and Satoru Kuhara
Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE
BMC Bioinformatics 2006, 7:543
[PDF][Web Site]

・マイクロアレイデータによるサンプル識別法として、Maximum margin criterion (MMC)の判別ベクトルを利用した再帰的遺伝子抽出法(RFE; recursive feature elimination)を提案する。これは古典的な線形判別分析の応用である。
・データ
(Binary-class)
1.Colon cancer [Alon]
2.Prostate cancer [Singh]
3.Leukemia [Golub]
4.Medulloblastoma [Pomeroy]
5.Breast cancer [van'tVeer]
(Multi-class)
6.MLL [Armstrong]
7.SRBCT [Khan]
8.CNS [Pomeroy]
9.NCI60 [Ross]
・比較法:SVM-RFE
・識別結果の評価法:3分割 cross-validation、100回繰り返し

・動機「Gene selection plays essential roles in classification tasks. It improves the prediction accuracy of classifiers by using only discriminative genes. It also saves computational costs by reducing dimensionality. More importantly, if it is possible to identify a small subset of biologically relevant genes, it may provide insights into understanding the underlying mechanism of a specific biological phenomenon. Also, such information can be useful for designing less expensive experiments by targeting only a handful of genes.
・注意「In this study, we do not address the problem of finding the optimum number of genes that would yield highest classification accuracy. Instead, the number of genes was varied from 1 to 100, and the performances were compared for each number of genes.
・結果「As our results indicate, the prediction of clinical outcome is generally more difficult than that of tissue or disease types.
・結果「The results suggest that MMC-RFE is less sensitive to noise and outliers due to the use of average margin, while the performance of SVM-RFE can be easily affected by them when applied to noisy, small sample size microarray data. Another advantage of MMC-RFE over SVM-RFE is that MMC-RFE naturally extends to multi-class cases. Furthermore, MMC-RFE does not require the computation of the matrix inversion unlike LDA-RFE and MSE-RFE, and involves no parameters to be tuned.
・RFEとは「The idea of recursive feature elimination (RFE) [6] is to recursively remove genes using the absolute weights of the discriminant vector or hyperplane, which reflect the significance of the genes for classification.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Nagamachi,2002,Kansei engineering as a powerfu~

2008年03月14日 08時03分24秒 | 論文記録
Mitsuo Nagamachi
Kansei engineering as apowerful consumer-orientedtechnologyfor product development
Applied Ergonomics(England) Vol.33. 289-294 (2002)  
[PDF][Web Site]

・感性工学の紹介。感性工学に基づいて開発し実際に発売された車や化粧品、そして "感性" を利用した商品開発システムなどを例にとって解説。
・ポイントは消費者にとってより満足度の高い製品の開発。いかに消費者個人個人の希望・嗜好を製品に織り込むかの技術。

・感性工学とは「Kansei engineering aims at translation of kansei into the product design field including product mechanical function.
・感性とは「When a customer wants to purchase something, for instance, to buy a car, TV set or clothing, he/she will have a kind of feeling such as "graceful and looks intelligent, but not so expensive...". This feeling is called "kansei" in Japanese. The kansei means the customer's psychological feeling as well as embracing physiological issues.
・「Kansei engineering or kansei ergonomics was founded at Hiroshima University about 30 years ago (Nagamachi, 1989, 1991, 1995a,b, 1999).
・「All products developed by the kansei engineering have had good sales in the market so far, because it aims to implement consumer's feelings and images (kansei) in new products and then the customers are willing to buy them.
・まとめ「If the sensing of the customers' kansei is anticipated accurately, the product development will be successful. Otherwise, the new product will be very difficult to fit to the market, even if kansei engineering is utilized.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Lee,2005,An extensive comparison of recent cla~

2008年03月07日 08時03分39秒 | 論文記録
Jae Won Lee, Jung Bok Lee, Mira Park and Seuck Heun Song,
An extensive comparison of recent classification tools applied to microarray data
Computational Statistics & Data Analysis Volume 48, Issue 4, 1 April 2005, Pages 869-885
[PDF][Web Site]

・DNAマイクロアレイを用いた遺伝子発現解析について、クラス分け法21種、遺伝子抽出法3種、データ7種を使った網羅的な性能比較。Dudoit(2002)の論文の補強版。
・クラス分け法
1. Fisher's linear discriminant analysis (FLDA)
2-3. Diagonal linear and quadratic discriminant analysis (DLDA, DQDA)
4. Logistic regression (LOGISTIC)
5. Generalized partial least squares (GPLS)
6. k nearest neighbor (kNN)
7-11. CART and aggregating classifiers (BAG, BOOST, LogitBOOST, RandomForest)
12-13. Single & multi layer neural network (NN-1, NN-3)
14-15. Support vector machine (SVM-linear, radial)
16-17. Flexible discriminant analysis (FDA-POL, FDA-MARS)
18. Penalized discriminant analysis (PDA)
19-20. Mixture discriminant analysis (MDA-Linear, MDA-MARS)
21. Shurunken centoroids method (or Predictive Analysis of Microarrays (PAM))
・遺伝子抽出法
1. BSS/WSS [Dudoit]
2. Wilcoxon rank-based statistics
3. soft-thresholding method [Tibshirani]
・データ
1. Leukemia (LEU) [Golub]
2. Lymphoma (LYM) [Alizadeh]
3. NCI 60 (NCI60)
4. Colon cancer (COLON) [Alon]
5. Lung cancer (LUNG) [Garber]
6. Small round blue cell tumor (SRBCT) [Khan]
7. Yeast [Eisen]
・クラス分け評価法: 2:1 cross-validation (200回繰り返し)

・問題点「no systematic comparison of statistical methods with different pre-processing strategies is available yet for finding the most appropriate classification tool once the specific type of data is given.

・結果としては、データが変わると最適な解析法も変わってくるというもの。最適なものが一意に決まらないところが大きな問題点。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Martella,2006,Classification of Microarray Dat~

2008年02月29日 08時04分26秒 | 論文記録
Francesca Martella
Classification of microarray data with factor mixture models
Bioinformatics, Volume 22, Number 2, 15 January 2006 , pp.202-208(7)
[PDF]

・遺伝子抽出法の性能比較
・遺伝子抽出法
1.Ghahramani and Hinton model (1996)
2.Rocci and Vichi model (2002)
3.McLachlan model (2002)
・データ:白血病 ALL/AML [Golub]
・結果:44個の "marker" genes を抽出

・概要「we propose a generalization of the approach proposed by McLachlan et al., 2002 by advising to estimate the distribution of log LR statistic for testing one versus two component hypothesis in the mixture model for each gene considered individually, using a parametric boostrap approach.
・展望「Moreover, an interesting future task would be to deal with the block clustering methods in a mixture approach to allow simultaneously clustering of objects (samples) and variables (genes).

・何が何だかサッパリ話が見えない内容。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Friedman,2000,Using Bayesian Networks to Analy~

2008年02月22日 08時02分01秒 | 論文記録
Nir Friedman, Michal Linial, Iftach Nachman, Dana Pe'er.
Using Bayesian Networks to Analyze Expression Data
Journal of Computational Biology. August 1, 2000, 7(3-4): 601-620.
[PDF][Web Site]

・それまでの研究の単なる遺伝子抽出だけではなく、マイクロアレイから得られる遺伝子発現データに Bayesian network を適用し、遺伝子間のネットワークの解明を目指す。マイクロアレイの開発が進み、実験がしやすくなり、解析するデータが増えてきたことによる。
・データ:S.cerevisiae (酵母) cell-cycle measurements, 76サンプル, 6177遺伝子 [Spellman,1998]
・Bayesian network に使う確率モデル
1.Multinomial model
2.Linear Gaussian model

・研究の動向「Early microarray experiments examined few samples, and mainly focused on differential display across tissues or conditions of interest. The design of recent experiments focuses on performing a larger number of microarray assays ranging in size from a dozen to a few hundreds of samples. In the near future, data sets containing thousands of samples will become available.
・特長「It is important to note that our learning algorithm uses no prior biological knowledge nor constraints. All learned networks and relations are based solely on the information conveyed in the measurements themselves.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Xiong,2001,Biomarker Identification by Feature~

2008年02月15日 08時01分07秒 | 論文記録
Momiao Xiong, Xiangzhong Fang, and Jinying Zhao
Biomarker Identification by Feature Wrappers
Genome Research, 11(11):1878-1887.,2001
[PDF]

・遺伝子抽出・サンプルクラス分けの "Feature Wrappers" で使用される三方法の性能比較。
・クラス分け法
1.Linear discriminant anarysis (Fisher's LDA)
2.Logistic regression (LR)
3.Support Vector Machines (SVMs)
・探索法
1.Sequential forward search (SFS) algorithms
2.Sequential forward floating search (SFFS) algorithms
・データ
1.Cancer colon [Alon]
2.Breast tumor [Hedenfalk]
3.Doxorubicin [Perou]

・問題点「First, although the calculation of fold changes or t-test and F test can identify highly differentially expressed genes, the classification accuracy of identified biomarkers by these methods is, in general, not very high. Second, most scoring methods do not use classification accuracy to measure a gene's ability to discriminate tissue samples.
・問題点「Third, to improve accuracy, several authors (Moler et al. 2000; Chow et al. 2001) used a combination of genes in the top of the list of ranked genes as a composite classifier. However, a simple combination of highly ranked markers according to their scores or discrimination ability may not be efficient for classification.
・目的「The goal of this research was to use feature (gene) selection incorporated into pattern recognition as a general framework for biomarker identification and optimal classifier generation.
・結果「First, the classification accuracy of the optimal subsets of genes searched by SFFS algorithm is greater than or equal to that obtained by SFS algorithm. Second, the accuracy increased when sizes of subsets of selected genes increased and quickly reached 100% accuracy for the SFFS algorithm, but suddenly dropped to 50% when the size of selected subsets of genes was >60 (which is close to total sample size of 62).
・結果「Third, it is interesting to note that the classification accuracy of optimal subsets of genes with size 4 searched by SFFS algorithm is 100%.
・概要「In this paper, we formulated the problem of biomarker identification as feature selection incorporated into pattern recognition (i.e., we formulated it into an optimization problem).
・問題点「Now the question is how many genes are required and which genes are selected to ensure the required classification accuracy.

・"pattern recognition" の絡み具合がどうもいまいちよく分からず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Park,2001,A nonparametric scoring algorithm~

2008年02月08日 08時01分06秒 | 論文記録
Park PJ, Pagano M, Bonetti M
A nonparametric scoring algorithm for identifying informative genes from microarray data
Pac Symp Biocomput. 2001;:52-63.
[PDF][Web Site]

・遺伝種抽出法 "nonparametric scoring algorithm" の提案。ノンパラメトリックの特性を活かしたアルゴリズム。
・アルゴリズム
1.各サンプルに0,1のラベルをつける
2.遺伝子ごとに発現量順にサンプルを並び替え
3.ラベル(0,1)の乱れ具合でスコアを算出。0,1がきれいに分かれるほど高得点
・データ:Leukemia data [Golub]
・最も良いスコアの遺伝子として "Zyxin" を抽出。

・問題点「What genes are useful for classification and how many of them should be used for predicting the classes of new samples?

ノンパラメトリック手法 統計学において、ノンパラメトリックの手法は母集団の分布について一切の仮定を設けないものをいう。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Shevade,2003,A simple and efficient algorithm ~

2008年02月01日 08時00分14秒 | 論文記録
S. K. Shevade and S. S. Keerthi
A simple and efficient algorithm for gene selection using sparse logistic regression
Bioinformatics Vol. 19 no. 17 2003, pages 2246-2253
[PDF][Web Site]

・"Sparse logistic regression" に基づいた遺伝子抽出法の提案。Gauss-Seidel法の流れをくむ。
・データ
1.Colon cancer data set, 22 normal / 40 cancer tissues, 2000 futures [Alon,1999]
2.Breast cancer data set, 49 tumor samples, 7129 genes [West,2001]
・クラス分け評価法:3-fold cross-validation

・方法「The proposed algorithm for the sparse logistic regression problem. The proposed algorithm is very much in the spirit of the Gauss-Seidel method (Bertsekas and Tsitsiklis, 1989) for solving unconstrained optimization problems.
・特長「The main contribution of this paper is to utilize the special structure of (2.2) and devise a simple algorithm which is extremely easy to implement;

・例によって、肝心のアルゴリズムが理解できない。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Jaeger,2002,Improved gene selection for classi~

2008年01月24日 07時58分45秒 | 論文記録
J. Jaeger, R. Sengupta, W. Ruzzo
Improved Gene Selection For Classification Of Microarrays
Pacific Symposium on Biocomputing, Kauai, Hawaii, Jan., 2003.
[PDF][Web Site]

・遺伝子抽出法の提案。
・オリジナル法(遺伝子抽出)
1.Correlation
2.Clustering
3.Masked out Clustering
・比較法(遺伝子抽出)
1.Fisher
2.Golub
3.Park
4.TNoM
5.t-test
・データ
1. 40 Adenocarcinoma and 22 normal samples [Alon]
2. 47 ALL and 25 AML [Golub]
3. 18 tumor and 18 normal samples [Notterman]
・遺伝子抽出の評価法:SVM + LOOCV により ROC を算出

・問題点「A problem with this approach is that many of these genes are highly correlated.
・目的「Given a series of microarray experiments for a specific tissue under different conditions we want to find the genes most likely differentially expressed under these conditions. In other words, we want to find the genes that best explain the effects of these conditions.
・原理「In order to increase the classification performance we propose to use more uncorrelated genes instead of just the top genes.
・「If many genes are highly correlated we could describe this pathway with fewer genes and reach the same precision. Additionally, we could replace correlated genes from this pathway by genes from other pathways and possibly increase the prediction accuracy.
・"Correlation" 原理「A simple greedy algorithm accomplishes this selection ? the k-th gene selected is the gene with highest p-value among all genes whose correlation to each of the first k-1 is below the specified threshold.
・"Clustering","Masked out Clustering" 原理「If the cluster then has a bad quality we might pick a lot of genes from that cluster even though they are not informative. To counteract this problem we implemented the possibility to mask out and exclude clusters that have an average bad test statistic p-value
・結果「There is no clear winner between the three proposed methods and it depends largely on the dataset and parameters used.
・展望「It is pretty expensive to try all possible numbers for clusters to find a setting that provides us with a good LOOCV performance. One direction for future work would be to estimate the number of clusters using a BIC (Bayesian Information Criterion) score or switching over to model based clustering.

・提案法の原理の細かい所が読みとれず
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Diaz-Uriarte,2006,Gene selection and classifi ~

2008年01月18日 08時01分49秒 | 論文記録
Ramon Diaz-Uriarte and Sara Alvarez de Andres
Gene selection and classification of microarray data using random forest
BMC Bioinformatics 2006, 7:3
[PDF][Web Site]

・Random forestに基づいた遺伝子抽出法とクラス分け法の提案。
・人工データ:クラス数2~4、データの分布の種類1~3、遺伝子数は各分布につき(5,20,100)、各クラスにつき25サンプル
・実データ
1.Leukemia [Golub]
2.Breast [van't Veer]
3.Breast [van't Veer]
4.NCI60 [Ross]
5.Adenocarcinoma [Ramaswamy]
6.Brain [Pomeroy]
7.Colon [Alon]
8.Lymphoma [Alizadeh]
9.Prostate [Singh]
10.Srbct [Khan]
・比較法(全て "R" で実行)
a) random forest (変数選択なし)
b) 変数選択なし
1.Diagonal Linear Discriminant Analysis (DLDA) [Dudoit]
2.K nearest neighbor (KNN) [Romualdi]
3.Support Vector Machines (SVM) with linear kernel [Dettling]
c) 変数選択を含む
1.Shrunken centroids (SC), Sc.l, SC.s [Tibshirani]
2.Nearest neighbor + variable selection (NN.vs)
・クラス分けの評価法
1.各方法の比較→0.632+ bootstrap method [Ambroise, Efron]
2.Random forestパラメータ変更の比較→Out-of-Bag (OOB)

・問題点「Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm.
・原理「Each of the classification trees is built using a bootstrap sample of the data, and at each split the candidate set of variables is a random subset of the variables.
・問題点「Unfortunately most "methods papers" in bioinformatics do not evaluate the stability of the results obtained, leading to a false sense of trust on the biological interpretability of the output obtained.
・特長「the main advantage of this method is that it returns very small sets of genes that retain a high predictive accuracy,
・展望「In a broader context, further work is warranted on the stability properties and biological relevance of this and other gene-selection approaches, because the multiplicity problem casts doubts on the biological interpretability of most results based on a single run of one gene-selection approach.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする