Yuhang Wang, Fillia S.Makedon, James C.Ford and Justin Pearlman
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data
Bioinformatics 2005 21(8):1530-1537
[PDF][Web Site]
・マイクロアレイデータに基づく遺伝子抽出法、HykGene (hybrid approach for selection marker genes) の提案。
・データ
1.ALL/AML leukemia [Golub]
2.MLL leukemia [Armstrong]
3.Colon tumor [Alon]
・データ処理ステップ
1.Gene ranking
2.Hierarchical clustering
3.Reduce gene redundancy by collapsing clusters
4.Classification
・ランキング法(指標)
1.Relief-F
2.Information Gain
3.χ2-statistic
・クラス分け法
1.k-nearest neighbor (k-NN)
2.Support vector machine (SVM)
3.C4.5 dicision tree
4.Naive Bayes (NB)
・クラス分け結果の評価:LOOCV
・比較法
1.(未処理データそのまま)
2.SOM
・方法「In this approach, we first applied feature filtering algorithms to select a set of top-ranked genes, and then applied hierarchical clustering on these genes to generate a dendrogram. Finally, the dendrogram was analyzed by a sweep-line algorithm and marker genes are selected by collapsing dense clusters.」
・データの特性「Classification using gene expression data poses a major challenge because of the following characteristics:
・M >> N. For typical datasets, M is in the range of 2000?30 000, while N is in the range of 40?200.
・Most features (genes) are not related to the given phenotype classification problem.」
・ランキング法の変遷「These gene ranking methods have been based on t-statistic (Golub et al., 1999), information gain (Su et al., 2003; Liu et al., 2002; Li et al., 2004), χ2-statistic (Liu et al., 2002), the threshold number of misclassification (TNoM) score (Ben-Dor et al., 2000) and concatenation of several feature filtering algorithms (Xing et al., 2001). 」
・特徴「Our approach is different from the previous pre-filtering approaches in that:
・We apply gene ranking methods first.
・We determine the best number of clusters systematically.」
・将来の展望「We are currently investigating alternative approaches that use Gene Ontology to guide this selection process.」
・話の筋が明快で分かりやすい
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data
Bioinformatics 2005 21(8):1530-1537
[PDF][Web Site]
・マイクロアレイデータに基づく遺伝子抽出法、HykGene (hybrid approach for selection marker genes) の提案。
・データ
1.ALL/AML leukemia [Golub]
2.MLL leukemia [Armstrong]
3.Colon tumor [Alon]
・データ処理ステップ
1.Gene ranking
2.Hierarchical clustering
3.Reduce gene redundancy by collapsing clusters
4.Classification
・ランキング法(指標)
1.Relief-F
2.Information Gain
3.χ2-statistic
・クラス分け法
1.k-nearest neighbor (k-NN)
2.Support vector machine (SVM)
3.C4.5 dicision tree
4.Naive Bayes (NB)
・クラス分け結果の評価:LOOCV
・比較法
1.(未処理データそのまま)
2.SOM
・方法「In this approach, we first applied feature filtering algorithms to select a set of top-ranked genes, and then applied hierarchical clustering on these genes to generate a dendrogram. Finally, the dendrogram was analyzed by a sweep-line algorithm and marker genes are selected by collapsing dense clusters.」
・データの特性「Classification using gene expression data poses a major challenge because of the following characteristics:
・M >> N. For typical datasets, M is in the range of 2000?30 000, while N is in the range of 40?200.
・Most features (genes) are not related to the given phenotype classification problem.」
・ランキング法の変遷「These gene ranking methods have been based on t-statistic (Golub et al., 1999), information gain (Su et al., 2003; Liu et al., 2002; Li et al., 2004), χ2-statistic (Liu et al., 2002), the threshold number of misclassification (TNoM) score (Ben-Dor et al., 2000) and concatenation of several feature filtering algorithms (Xing et al., 2001). 」
・特徴「Our approach is different from the previous pre-filtering approaches in that:
・We apply gene ranking methods first.
・We determine the best number of clusters systematically.」
・将来の展望「We are currently investigating alternative approaches that use Gene Ontology to guide this selection process.」
・話の筋が明快で分かりやすい
※コメント投稿者のブログIDはブログ作成者のみに通知されます