ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Huang,2007,Effective Gene Selection Method ~

2008年09月14日 08時00分53秒 | 論文記録
D. Huang, T. W. S. Chow
Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques
Computational Biology and Bioinformatics, IEEE/ACM Transactions on Volume 4, Issue 3, July-Sept. 2007 Page(s):467-475
[PDF][WebSite]

・ベイズ統計の評価関数(Bayesian discriminant-based criterion (BD))を再帰的に使って遺伝子を抽出する方法(SFS)を更に改良した方法2種を提案する。提案法は Gradient-based Strategy または Weighting-Sample Strategy と Point-Injection Strategy を使用する。
・データ
1.人工データ
2.実データ
a.Colon Tumor
b.Prostate cancer
c.Leukemia subtype
・遺伝子抽出法
1.SFS (sequential forward search)
2.MSFS (modified SFS)(提案法)
3.WMSFS (modified SFS with the maximal-probability-weighting-injected-point strategy)(提案法)
・サンプルクラス分け法
1.Multiply percepton model (MLP)
2.Support vector machine with "Linear" kernel (SVM-L)
3.Support vector machine with "RBF" kernel (SVM-R)
4.3-nearest neighbor rule classifier (3-NN)

・方法「In this model, the employed search engine is the sequential forward search (SFS). The evaluation criterion is based on Bayesian discriminant [13].
・方法「The first strategy is designed to enhance the effectiveness of searching. The second one addresses the problem of overfitting.
・方法「A point injection approach is designed. The concept of the injection approach is to generate a number of points according to the distribution of given samples. Then, gene subsets can be assessed using the generated points and the original samples,.
・前処理「In detail, for each given gene g, BD(g) is calculated based on (7), where w=1 for all samples. The genes with small values of BD are considered irrelevant and eliminated. In such as way, a huge gene set can be safely reduced
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Chen,2007,Gene selection with multiple orderin~

2008年08月22日 08時03分24秒 | 論文記録
James J Chen, Chen-An Tsai, ShengLi Tzeng, and Chun-Houh Chen
Gene selection with multiple ordering criteria
BMC Bioinformatics. 2007; 8: 74.
[PDF][WebSite]

・遺伝子抽出法として Layer ranking algorithm を提案する。これは複数の(既存の)ランキング法の結果を重ね合わせ(layer)て独自のランキングを行う。
・データ
1.Colon Data set [Alon]
2.Ionizing Radiation Data set [Tusher]
3.Dilution Data set
・提案法:Layer ranking algorithm の三種の設定
1.Point-admissible
2.Line-admissible (convex)
3.Pareto
・使用するランキング法
1.Fold-change
2.p-value
3.Frequency of selections by the SVM-RFE

・Wrapperとは「The wrapper approach is an alternative gene selection method; the wrapper approach finds a subset of genes and evaluates its relevance while building the prediction model.
・概要「This paper proposes three layer ranking algorithms for gene ranking with multiple ranking criteria, where each individual criterion constitutes its ordering of preference for selection.
・注意点「Note that cross-validation performed after gene selection process is known as internal cross-validation (e.g., the SVM classifier), whereas cross-validation prior to gene selection is known the external cross-validation [8].
・「Recently, the MicroArray Quality Control consortium suggested: "Fold-change ranking plus a non-stringent P-value cutoff can be used as a baseline practice for generating more reproducible signature gene lists" [18].
・課題「We are currently investigating different univariate selection criteria in conjunction with layer ranking algorithms to improve predictive accuracy.

・肝心の、ランキング結果の重ね合わせ方法がさっぱり理解できず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Liu,2006,Probe-level measurement error improve~

2008年08月16日 08時07分19秒 | 論文記録
Xuejun Liu, Marta Milo, Neil D Lawrence and Magnus Rattray
Probe-level measurement error improves accuracy in detecting differential gene expression
Bioinformatics 2006 22(17):2107-2113
[PDF][WebSite]

・遺伝子抽出の精度には、(通常無視される) Probe-level measurement が影響しているので、これを考慮に入れたオリジナルの Baysian hierarchical model を提案する。この方法はPPLR (probability of positive log-ratio)に基づく。
・データ
1.Golden spike-in dataset(人工データ)[Choe,2005]
2.A real mouse time-course dataset(実データ)[Lin,2004]
・比較法(お互いに計算上関連している)
1.MAP (maximun a posteriori) approximation
2.オリジナル法 Variational method
3.MCMC (Markov chain Monte Carlo)

・問題点「(1) Microarray experiments are associated with low precision probe-level measurements, especially for weakly expressed genes (probe-level measurement error).
(2) The small number of replicates makes it difficult to obtain an accurate variance estimate for each gene across replicates (between-replicate variance).

・概要「We have presented an approach using probe-level measurement error in order to improve the detection of differential gene expression and compared three different computation methods, MAP approximation, a variational method and MCMC, to solve the intractability in the model owing to the incorporation of probe-level measurement error.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Sartor,2006,Intensity-based hierarchical Bayes~

2008年08月09日 08時01分25秒 | 論文記録
Maureen A Sartor, Craig R Tomlinson, Scott C Wesselkamper, Siva Sivaganesan, George D Leikauf and Mario Medvedovic
Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments
BMC Bioinformatics 2006, 7:538
[PDF][WebSite]

・遺伝子抽出法としてIBMT (Intensity-based Moderated T-static)を提案する。
・人工データ:Single channel microarray, 6サンプル
・実データ
1.Controlled spike-in dataset [Choe]
2.HG-U133 latin-square spike-in dataset
3.MEF Ahr dataset
4.Nickel exposure dataset
・比較法
1.t-test
2.Fold change
3.SMT (Smyth's moderated T-statistic)
4.IBMT
5.Fox (Fox's method)
6.Cyber-T
・抽出した遺伝子の評価法
1.FDR (False Dicovery Rate) → サンプル識別を評価
2.EASE (Expression Analysis Systematic Explorer) → GOを使った評価
・IBMTのRソースコードはHPで公開中。

・問題点「However, it does not utilize the relationship between variances of expression level measurements and their magnitude.
・結果「As expected, the empirical Bayes method that does not account for the relationship between the variance and the magnitude of expression measurements tends to underestimate the prior degrees of freedom, especially for larger d0 values.
・結果「We demonstrated that incorporating information about the dependence of the variance of genes on expression intensity level can improve the efficiency of the Empirical Bayes moderated t-statistics, and that properly estimating the prior degrees of freedom is important in estimating the true proportion of false positives.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Geman,2004,Classifying Gene Expression Profile~

2008年07月18日 22時20分02秒 | 論文記録
Donald Geman, Christian d'Avignon, Daniel Q. Naiman, and Raimond L. Winslow
Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
Statist. Appl. in Genetics and Molecular Biology, 3, 2004.
[PDF][WebSite]

・遺伝子サンプルのクラス分け法として、Top-scoring pair(s) (TSP) classifierを提案する。
・スコア算出の式 Δij = | pij(1)-pij(2)|
pij(1) = (Class1でXi<Xj(または Xi>Xj)となるサンプル個数)/(Class1サンプル個数)
pij(2) = (Class2でXi<Xj(または Xi>Xj)となるサンプル個数)/(Class2サンプル個数)
 X:発現量, i・j:遺伝子番号
※考え方→各サンプルで最も安定して発現量が同じ上下関係を保つ遺伝子のペア(2個)を抽出する。発現量差そのものには着目しない。
・データ
1.Breast [West]
2.Leukemia [Golub]
3.Prostate [Singh]
・判別の評価法 : Cross validation
・サンプル数に大きく影響を受ける方法なので、サンプル数が少ない場合についても検証・考察する

・長所「In contrast, the TSP classifier provides decision rules which i) involve very few genes and only relative expression values (e.g., comparing the mRNA counts within a single pair of genes); ii) are both accurate and transparent; and iii) provide specific hypotheses for follow-up studies.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Huang,2003,Linear regression and two-class cla~

2008年07月12日 10時07分21秒 | 論文記録
Xiaohong Huang and Wei Pan
Linear regression and two-class classification with gene expression data
Bioinformatics Vol. 19 no. 16 2003 pages 2072-2078
[PDF][WebSite]

・これまで提案されているクラス分け法として、Linear regression model の3方法を取り上げ考察を加え、新たに Partial least squares (PLS)と Penalized PLS (PPLS) の2法を提案する。
・データ
1.Leukemia data [Golub]
2.Colon data [Alon]
・クラス分け比較法(Linear regression model)
1.Weighted voting method [Golub, Tukey]
2.Compound covariate method [Hedenfalk]
3.Shrunken centroids method [Tibshirani]

・問題点「However, most of the existing variable selection schemes are based on univariate analyses, and they proceed in a sequential way because it is computationally too demanding to do best subset selection for large data sets, and hence may not be optimal.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Xu,2008,SDED: A novel filter method for cancer~

2008年07月04日 08時02分08秒 | 論文記録
Wenlong Xu, Minghui Wang, Xianghua Zhang, Lirong Wang, Huanqing Feng
SDED: A novel filter method for cancer-related gene selection
Bioinformation 2(7): 301-303 (2008)
[PDF]

・遺伝子抽出法として、standard deviation error distribution (SDED)を提案する。
・データ
1.MLL dataset [KORSMEYER Laboratory]
2.ALL-AML dataset [BROAD Institute]
3.ALL dataset [St. Jude Chirdren's Research hospital]
・比較法(遺伝子抽出)
1.SDED
2.GS2 [Yang]
3.CHO
・クラス分け法:SVM
・評価法
1.LOO_CV
2.遺伝子データベース(OMIM)の情報を基に、抽出した遺伝子の生物学的評価

・Wrapperの問題点「Thus, the wrappers are computationally intractable for high-dimensional gene data [1]. The inherent linear nature is their disadvantage and it makes it difficult to identify important genes in wrapper methods [11].

・何故だか、肝心のSDEDのアルゴリズムについての説明がスッポリ抜け落ちている。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Wang,2007,Improved centroids estimation for ~

2008年06月28日 08時06分30秒 | 論文記録
Sijian Wang and Ji Zhu
Improved centroids estimation for the nearest shrunken centroid classifier
Bioinformatics 2007 23(8):972-979
[PDF][WebSite]

・サンプルクラス分け法のNSCを改良した、ALP-NSCとAHP-NSCの二方法を提案する。
・データ
0.人工データ
1.Leukemia [Golub]
2.SRBCT [Kahn]
3.NCI-60 [Dudoit]
・比較法
1.NSC (Nearest shrunken centroid)
2.NSC-Ada (NSC with adaptive thresholds)
3.ALP-NSC (Adaptive L-norm penalized NSC)
4.AHP-NSC (Adaptive hierarchically penalized NSC)
・評価法
1.Cross-validation (10-fold, 8-fold)
2.Random split、100回繰り返し

・NSCとは「The NSC uses ‘shrunken’ centroids as prototypes for each class and identifies subsets of genes that best characterize each class.
・概要「In this article, we re-derive the NSC method as a LASSO regression on gene expression profiles. This re-interpretation allows us to notice that the L1-norm penalty used by NSC may not be the most effective way in analyzing microarray data.(中略)Enlightened by these observations, we consider two different penalty functions different from the L1-norm penalty to make use of natural grouping information within the data.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Tan,2005,Simple decision rules for classifying~

2008年06月20日 08時00分39秒 | 論文記録
Aik Choon Tan, Daniel Q. Naiman, Lei Xu, Raimond L. Winslow and Donald Geman
Simple decision rules for classifying human cancers from gene expression profiles
Bioinformatics 2005 21(20):3896-3904
[PDF][WebSite]

・サンプル分類法として、k-TSP (k-Top Scoring Pairs) を提案する。これは過去提案されたTSP [Geman,2004] の改良版である。
・データ
#Binary class
1.Colon [Alon]
2.Leukemia [Golub]
3.CNS [Pomeroy]
4.DLBCL [Shipp]
5.Lung [Gordon]
6.Prostate1 [Singh]
7.Prostate2 [Stuart]
8.Prostate3 [Welsh]
9.GCM [Ramaswamy]
#Multi-class
1.Leukemia1 [Golub]
2.Lung1 [Beer]
3.Luekemia2 [Armstrong]
4.SRBCT [Khan]
5.Breast [Perou]
6.Lung2 [Bhattacharjee]
7.DLBCL [Alizadeh]
8.Leukemia3 [Yeoh]
9.Cancers [Su]
10.GCM [Ramaswamy]
・比較法
1.TSP
2.k-TSP
3.DT (C4.5 decision tree)
4.NB (Naive Bayes)
5.k-NN (k-nearest neighbor)
6.SVM (Support Vector Machines)
7.PAM (Prediction analysis of microarrays)
・クラス分けの評価法
#Binary class:識別率はLOOCVで計算
#Multi class:実験の設定は以下の三つ
1.One-vs-Others (1-vs-r)
2.One-vs-One (1-vs-1)
3.Hierarchical Classification (HC)

・問題点「Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers.
・k-TSPとは「k-TSP, a refinement of the original TSP algorithm, which uses exactly k pairs of genes for classifying gene expression data. When k = 1, this algorithm, referred to simply as TSP necessarily selects a unique pair of genes. More generally, both TSP and k-TSP may be seen as special cases of a new classification methodology based on the concept of ‘relative expression reversals.'
・「This is accomplished by basing the classification on the k disjoint Top Scoring Pairs (k-TSP) of genes that achieve the best combined score.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Baker,2006,Identifying genes that contribute ~

2008年06月15日 10時09分07秒 | 論文記録
Stuart G Baker and Barnett S Kramer
Identifying genes that contribute most to good classification in microarrays
BMC Bioinformatics 2006, 7:407
[PDF]

・病気判別のために抽出した遺伝子を正当に評価する手法として、Multiple random validation を提案する。従来の N-fold cross validation のように決まった比率でサンプルを training と test に分けるのではなく、サンプルをランダムに分割し、試行を重ね平均を取る。
・データ
1.Colon cancer [Alon]
2.Leukemia [Golub]
3.Medulloblastoma [Pomeroy]
4.Breast cancer [West]
・判別法:Filter with a nearest centroid rule
・判別の評価法:Receiver operating characteristic (ROC) curves、Estimated area under the ROC curve (AUC)

・問題点「The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance.
・方法「In multiple random validation, the data are randomly split into training and test samples many times. Unlike cross-validation, the goal is not to average performance over test samples but to investigate the variability of performance over test samples and the frequencies of genes selected on random splits [2].
・目的「Therefore our goal was to identify classification rules that perform well with the fewest genes, and so may be more "robust" than rules with more genes.
・将来の展望「Future research using a wrapper would be of great interest because of the potential of the wrapper to identify genes that have good classification when considered together but poor classification when considered separately.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする