ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Ntzani,2003,Predictive ability of DNA microarr~

2008年06月05日 08時02分24秒 | 論文記録
E. Ntzani, J. Ioannidis
Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment.
The Lancet, Volume 362, Issue 9394, Pages 1439-1444, 2003
[PDF]

・ガンの診断に関するマイクロアレイデータを網羅的に集め、その解析方法やデータの大きさなどの実験環境と診断の精度との関係について検証する。
・データ:84種類。

・結論「DNA microarrays addressing cancer outcomes show variable prognostic performance. Larger studies with appropriate clinical design, adjustment for known predictors, and proper validation are essential for this highly promising technology.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Pan,2006,Incorporating gene functions as prior~

2008年05月30日 08時10分05秒 | 論文記録
Wei Pan
Incorporating gene functions as priors in model-based clustering of microarray gene expression data
Bioinformatics 2006 22(7):795-801
[PDF][Web Site]

・GOやMIPSなどの遺伝子データベースからの知識を取り入れたクラスタリング法の提案。
・データ
1.人工データ、パラメータを変えて4種作成
2.Yeastデータ、300サンプル [Hughes]
・比較法:"R"の関数で実行
1.NSC (nearest shrunken centroids) [Tibshirani]
2.LDA (linear discriminant analysis)
3.RF (random forests) [Breiman]
4.SVM (support vector machines) [Vapnik]

・問題点「However, most existing methods, including model-based clustering, ignore known gene functions in clustering.
・概要「In this paper, we propose such an approach that uses gene functional annotations as priors for model-based clustering.
・結果「Comparing Tables 11 and 3, we found that the two clustering methods, especially our proposed new one, worked well with results quite close to that of supervised learning methods,
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Nagayama,2002,Genome-wide Analysis if Gene Exp~

2008年05月23日 08時07分28秒 | 論文記録
Satoshi Nagayama, Toyomasa Katagiri, Tatsuhiko Tsunoda, Taisuke Hosaka, Yasuaki Nakashima, Nobuhito Araki, Katsuyuki Kusuzaki, Tomitaka Nakayama, Tadao Tsuboyama, Takashi Nakamura, Masayuki Imamura, Yusuke Nakamura and Junya Toguchida
Genome-wide Analysis of Gene Expression in Synovial Sarcomas Using a cDNA Microarray
Cancer Research 62, 5859-5866, October 15, 2002
[PDF][Web Site]

・これまで情報が少なく診断が難しかった悪性腫瘍(Synovial Sarcomas (SS))をマイクロアレイを用いて診断する。
・データ:47サンプル(SS-34/その他-13)
・実験
1.SSとその他の腫瘍とにクラス分けをする
2.SSのサンプルを更に二群にクラス分けする
・データ解析には既成ソフト "Cluster"、"TreeView" [Eisen]を使用。

・問題点「However, some sarcomas have no histological counterparts in normal tissues and therefore are grouped together as "miscellaneous soft tissue tumors" in the latest edition of the WHO Soft Tissue Tumor Classification (2).

・医学、生物学的記述が主で、その内容がさっぱりつかめず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Ye,2005,Characterization of a Family of Algori~

2008年05月16日 22時30分33秒 | 論文記録
Jieping Ye
Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems
Journal of Machine Learning Research. Vo. 6(Apr), pp. 483-502, 2005.
[PDF][Web Site]

・多変量データの次元を減らす方法として、OLDA (Orthogonal Linear Discriminant Analysis)を提案する。
・データ
1.文章データ tr41
2.文章データ re0
3.画像データ PIX
4.画像データ AR
5.遺伝子発現データ GCM [Ramaswamy]
6.遺伝子発現データ ALL [Yeoh]
・判別法
1.Uncorrelated LDA
2.Orthogonal LDA(提案法)
3.Regularized LDA
・判別評価法:3-fold cross-validation

・問題点「Many machine learning and data mining problems involve data in very high-demensional spaces. We consider demension reduction of high-dimensional, undersampled data, where the data dimension is much larger than the sample size.
・LDAとは「LDA computes the optimal transformation (projection), which minimizes the within-class distance (of the data set) and maximizes the between-class distance simultaneously, thus achieving maximum discrimination.
・OLDAとは「The key property OLDA is that the discriminant vectors of OLDA are orthogonal to each other, i.e., the transformation matrix of OLDA is orthogonal.
・結果「ULDA has the property that the features in the reduced space are uncorrelated, while OLDA has the property that the discriminant vectors obtained are orthogonal to each other.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Michiels,2005,Prediction of cancer outcome wit~

2008年05月09日 08時09分39秒 | 論文記録
Stefan Michiels , Serge Koscielny and Catherine Hill
Prediction of cancer outcome with microarrays: a multiple random validation strategy
The Lancet, Volume 365, Issue 9458, 5 February 2005-11 February 2005, Pages 488-492
[PDF]

・ガンのマイクロアレイデータを用いた予後診断に関する研究の結果を、Multiple random validation strategy により、適切な識別評価を行なっているかどうかを検証する。
・データ
1.Non-Hodgkin lymphoma [Rosenwald]
2.Acute lymphocytic leukaemia [Yeoh]
3.Breast cancer [van't Veer]
4.Lung adenocarcinoma [Beer]
5.Lung adenocarcinoma [Bhattacharjee, Ramaswamy]
6.Medulloblastoma [Pomeroy]
7.Hepatocellular carcinoma [Iizuka]

・概要「We aimed to assess the extent to which the molecular signature depends on the constitution of the training set, and to study the distribution of misclassification rates across validation sets, by applying a multiple random training-validation strategy. We explored the relation between sample size and misclassification rates by varying the sample size in the training and validation sets.
・問題点「In principle, there is no biological or mathematical reason why one particular classification method should be better than others for the prediction of the outcome of cancer patients by use of microarray data.
・結果「Five of the seven largest published studies addressing cancer prognosis did not classify patients better than chance. This result suggests that these publications were overoptimistic.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Li,2006,Efficient and Robust Feature Extractio~

2008年05月02日 08時01分09秒 | 論文記録
Haifeng Li, Tao Jiang, and Keshu Zhang
Efficient and robust feature extraction by maximum margin criterion
IEEE Trans. Neural Netw., vol.17, pp.157, Jan.2006.
[PDF][Web Site]

・特徴抽出法として、MMC (maximum margin criterion) を提案する。
・データ
1.Fisher's iris dataset
2.Vehicle dataset
3.ORL face dataset
4.Brain cancer gene expression dataset [Pomeroy]
・比較法
1.MMC
2.kernel MMC
3.LDA
4.LDA + PCA
5.kernel PCA

・問題点「However, PCA is not very effective for the extraction of the most discriminant features, and LDA is not stable due to the small sample size problem.
・概要「In this paper, a simple, efficient, and stable method is proposed to calculate the most discriminant vectors and to avoid the small sample size problem based on a new feature extraction criterion, the maximum margin criterion (MMC). Geometrically, MMC maximizes the (average) margin between classes.
・問題点「In practice, the proposed nonlinear feature extractor could be slow when the dataset is large. It is an interesting topic to develop a fast algorithm for the proposed nonlinear feature extractor.

・工学系の論文なので、主な用途は画像処理。ついでにマイクロアレイデータを試してみました~ というノリ。数式はチンプンカンプン。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Huang,2006,Incorporating biological knowledge ~

2008年04月25日 08時04分01秒 | 論文記録
Desheng Huang and Wei Pan
Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data
Bioinformatics 2006 22(10):1259-1268
[PDF][Web Site]

・DNAマイクロアレイによる遺伝子機能推定の方法として、従来の統計的手法に生物学的知識(データベース)を取り入れた方法を提案する。
・データ
1.人工データ、3種
2.酵母(Saccharomyces cerevisiae), 300サンプル [Hughes]
・実験
1.Two-class experiment
2.Three-class experiment
3.Model selection : Average silhouette と Five-fold CV の比較。

・方法「A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters.
・概要「This article concerns incorporating biological knowledge into clustering genes for gene function discovery using microarray expression data, though the methodology can have more broad applications.
・現状「In general, with relatively high noise levels of genomic data, it is recognized that incorporating biological knowledge into statistical analysis is a reliable way to maximize statistical efficiency and enhance the interpretability of analysis results.

・生物学知識の取り入れ方、グラフの見方がわからず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Cuperlovic-Culf,2005,Determination of tumor ~

2008年04月17日 08時04分32秒 | 論文記録
Miroslava Cuperlovic-Culf, Nabil Belacel and Rodney J. Ouellette
Determination of tumour marker genes from gene expression data.
Drug Discovery Today Target . Published on March 2005 . Vol 10 pg 429-437
[PDF]

・非専門家向けのDNAマイクロアレイ関連研究の紹介。
<目次>
背景
Methods used for the selection of diagnostic genes in cancer
Statistical methods for gene analysis
  Fold change method
  t-statistics and variations
  Signal to noise ratio test
  Significance analysis of microarray
  Wilcoxon rank sum test
  Mutli-feature methods
Murti-condition methods
  ANOVA
  Correlation coefficient analysis
Conclusions
Acknowledgements

・研究成果「Recent efforts in gene selection resulted in the determination of 67 genes that appear to be either more or less active in various cancer cells [43].
・他論文の結論「The conclusions of this paper [Li] were that no method is optimal for all datasets. In addition, the accuracy of the classification was more dependent upon the classification method than the feature analysis and selection methodology used to determine a subset of diagnostic genes [Guyon].
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Li,2005,Analysis of recursive gene selection ~

2008年04月11日 08時02分52秒 | 論文記録
Fan Li and Yiming Yang
Analysis of recursive gene selection approaches from microarray data
Bioinformatics 2005 21(19):3741-3747
[PDF][Web Site]

・遺伝子抽出法について、主に再帰(recursive)処理の有無の差に焦点をあてて、その性質を明らかにする。
・データ
1.ALL-AML Leukemia data [Fodor,1997]
2.Breast Cancer data [Van't Veer,2002]
3.GCM data [Ramaswamy,2001]
・遺伝子抽出法
1.Rocchio
2.Ridge regression (RR) (recursive, non-recursive)
3.SVM (recursive, non-recursive)
・クラス分け評価法:Leave one out cross-validation

・問題点「However, it is not well understood how much of the success depends on the choice of the specific classifier and how much on the recursive procedure.
・「filtering approaches, meaning that feature selection is carried out in a preprocessing step of classification, independent from the choice of the classification method, and wrapper approaches, meaning that a classifier is used to generate scores for features in the selection process and feature selection depends on the choice of the classifier.
・問題点「However, they have very different strategy to penalize redundant features, which lead to their very different gene selection performance.
・目的「In this paper, we addressed a key question for wrapper-style feature selection: what property of a classifier would lead to the success of recursive feature elimination?
・結果「By analyzing three different classifiers, we find that the ability of a classifier for penalizing redundant features in the recursive process has a strong influence on its success.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Tai,2007,Incorporating prior knowledge of gene~

2008年04月04日 08時04分40秒 | 論文記録
Feng Tai and Wei Pan
Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data
Bioinformatics 2007 23(23):3170-3177
[PDF][Web Site]

・サンプルのクラス分け法として、従来の線形判別法(LDA)に遺伝子データベースからの知識を取り入れた方法を提案する。
・人工データ 4種
・実データ
1.Breast cancer data [Huang,2003]
2.Lung cancer data [Gordon,2002]
3.Prostate cancer data [Singh,2002]
4.Leukemia data [Armstrong,2001]
・提案法
1.ISCGRDA-1: GRDA-1 with individual shrinkage
2.ISCGRDA-2: GRDA-2 with individual shrinkage
3.GSCGRDA: GRDA-2 with group shrinkage
・比較法
1.PAM; Predictive analysis of microarray (The neaest shrunken centroid methods)
2.SCRDA; Shrunken centroids regularized discriminant analysis
3.SVM

・方法「Instead of simply treating all the genes independently or imposing no restriction on the correlations among the genes, we group the genes according to their biological functions extracted from existing biological knowledge or data, and propose regularized covariance estimators that encourages between-group gene independence and within-group gene correlations while maintaining the flexibility of any general covariance structure.
・概要「In this article, we propose several versions of a modified LDA, group regularized discriminant analysis (GRDA) that aims to take advantage of existing gene functional groups.
・特長「A main difference from other modifications of LDA is that we regularize the covariance matrix by considering group relationships among variables.
・特長「Another main difference is our consideration of a group shrinkage scheme that tends to retain or remove a whole group of the genes altogether.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする