中山大学学报自然科学版 ›› 2010, Vol. 49 ›› Issue (2): 37-42.

• 研究论文 • 上一篇    下一篇

基于ITAFSVM的微阵列数据特征选择和分类

戴宏亮
  

  1. (广东商学院 数学与计算科学学院,广东 广州 510320)
  • 收稿日期:2009-03-16 修回日期:1900-01-01 出版日期:2010-03-25 发布日期:2010-03-25

Automatic Feature Selection and Classification of Microarray Data Based on ITAFSVM

DAI Hongliang   

  1. (Department of Mathematics and Computational Science, Guangdong University of Business Studies, Guangzhou 510320, China)
  • Received:2009-03-16 Revised:1900-01-01 Online:2010-03-25 Published:2010-03-25

摘要:

支持向量机已经被成功应用于基因表达谱数据分析。但是,仍有开放问题需要解决:①支持向量机不能自动进行基因表达谱数据的特征选择;②支持向量机的参数优选没有简单有效的办法。一种新型具有良好特性的支持向量机——全间隔自适应模糊支持向量机(TAFSVM)被提出。并且提出一种新的遗传算法——智能遗传算法(IGA)来设计一个TAFSVM分类器,称为ITAFSVM,同时优化TAFSVM参数集和特征选择,并且结合10fold交叉验证来确定其泛化能力。最后将ITAFSVM应用于四种基因表达谱数据集。通过与进化支持向量机(ESVM)方法、粗糙集与径向基神经网络组合(RBFRBFNN)方法进行了比较,实验结果表明运用ITAFSVM不仅可以自动进行基因表达谱数据特征选择,而且分类精度和稳定性都较高,速度更快。

关键词: 全间隔自适应模糊支持向量机, 智能遗传算法, 基因表达谱, 分类, 微阵列

Abstract: SVM has been successfully employed to solve the analysis of gene expression data. However, there are still open issues which need to be addressed: ① SVM does not offer the mechanism of automatic internal relevant feature selection; ② There are no simple and effective means to confirm the appropriate parameters setting of SVM. In this study, total marginbased adaptive fuzzy support vector machine (TAFSVM) which has good quality is proposed. In addition, it is proposed an evolutionary approach to design a TAFSVMbased classifier (named ITAFSVM) by simultaneous optimization of automatic feature selection and parameters tuning using an intelligent genetic algorithm (IGA), combined with 10fold crossvalidation regarded as an estimator of generalization ability. Subsequently, the model of ITAFSVM is used to analyze four gene expression datasets. Comparisons with evolutionary support vector machine and a combination of roughbased feature selection and RBF neural network are reported. The experimental results indicate that the proposed ITAFSVM model can not only accomplish automatic feature selection, but also achieve higher classification accuracy, stable and faster speed.

Key words: total marginbased adaptive fuzzy support vector machine, intelligent genetic algorithms, gene expression, classification, microarray

中图分类号: