Kerby A. Shedden, Jeremy M. G. Taylor, Thomas J. Giordano, Rork Kuick, David E. Misek, Gad Rennert, Donald R. Schwartz, Stephen B. Gruber, Craig Logsdon, Diane Simeone, Sharon L. R. Kardia, Joel K. Greenson, Kathleen R. Cho, David G. Beer, Eric R. Fearon and Samir Hanash
Accurate Molecular Classification of Human Cancers Based on Gene Expression Using a Simple Classifier with a Pathological Tree-Based Framework
American Journal of Pathology. 2003;163:1985-1995.
[PDF][Web Site]
・病理学(pathology)の知識を取り入れた遺伝子クラス分け法の提案。トレーニングデータで各ガンの特徴を抽出して指標となる遺伝子を選び出し、その遺伝子の発現量を基にテストデータを各クラスへ振り分ける。振り分けには古典的なKNNを使用。
・データ:ヒトのガン組織。14種類。サンプル数総計約700。三つの研究グループからの寄せ集め。 [Whitehead, Giordano, Su]
・問題点「A common feature of the methods, at least insofar as they are applied in the cited works, is that they base their predictions entirely on the microarray measurements, without incorporating knowledge about the relationships between tumor types derived from decades of histopathological analysis,」
・方法「A key feature of our approach is to incorporate a simple tree-based framework based on tumor ontogeny into the classifier.」
・方法「The key step in training our classifier is the selection of a set of genes that are informative for distinguishing among the child nodes at each split in the tree.」
・問題点「It is typical of most statistical learning algorithms that initially the error rate improves as the number of marker genes increases from small to moderate, but as the number of marker genes becomes large the algorithm overfits the data and the generalization performance actually becomes worse.」
・特徴「A unique feature of our method is its ability to use different sets of marker genes and different numbers of marker genes for classifying different specimens.」
・問題点「One important issue will be to study how the difficulty of the problem increases as the set of tumor classes is expanded to more realistically reflect the myraid types of human tumors.」
・概要「By mimicking the strategies used by pathologists, we demonstrate that pathological knowledge based on the accumulated work from the last 100 years on tumor morphology and global gene expression data can be effectively combined, resulting in accurate molecular classification with fewer genes and without the need for black box-type sophisticated methods of statistical learning.」
・全クラス横並びで振り分けるのではなく、クラスをツリー構造状に区切り、選択肢を限定したなかで振り分けるところがキモ。精度はとにもかくにもツリー(Fig.1)の出来如何。
Accurate Molecular Classification of Human Cancers Based on Gene Expression Using a Simple Classifier with a Pathological Tree-Based Framework
American Journal of Pathology. 2003;163:1985-1995.
[PDF][Web Site]
・病理学(pathology)の知識を取り入れた遺伝子クラス分け法の提案。トレーニングデータで各ガンの特徴を抽出して指標となる遺伝子を選び出し、その遺伝子の発現量を基にテストデータを各クラスへ振り分ける。振り分けには古典的なKNNを使用。
・データ:ヒトのガン組織。14種類。サンプル数総計約700。三つの研究グループからの寄せ集め。 [Whitehead, Giordano, Su]
・問題点「A common feature of the methods, at least insofar as they are applied in the cited works, is that they base their predictions entirely on the microarray measurements, without incorporating knowledge about the relationships between tumor types derived from decades of histopathological analysis,」
・方法「A key feature of our approach is to incorporate a simple tree-based framework based on tumor ontogeny into the classifier.」
・方法「The key step in training our classifier is the selection of a set of genes that are informative for distinguishing among the child nodes at each split in the tree.」
・問題点「It is typical of most statistical learning algorithms that initially the error rate improves as the number of marker genes increases from small to moderate, but as the number of marker genes becomes large the algorithm overfits the data and the generalization performance actually becomes worse.」
・特徴「A unique feature of our method is its ability to use different sets of marker genes and different numbers of marker genes for classifying different specimens.」
・問題点「One important issue will be to study how the difficulty of the problem increases as the set of tumor classes is expanded to more realistically reflect the myraid types of human tumors.」
・概要「By mimicking the strategies used by pathologists, we demonstrate that pathological knowledge based on the accumulated work from the last 100 years on tumor morphology and global gene expression data can be effectively combined, resulting in accurate molecular classification with fewer genes and without the need for black box-type sophisticated methods of statistical learning.」
・全クラス横並びで振り分けるのではなく、クラスをツリー構造状に区切り、選択肢を限定したなかで振り分けるところがキモ。精度はとにもかくにもツリー(Fig.1)の出来如何。