›› 2014, Vol. 53 ›› Issue (5): 59-65.

Previous Articles     Next Articles

An Imbalanced Feature Selection Algorithm Based on Random Forest

YIN Hua,HU Yuping   

  1. School of Information, Guangdong University of Finance Economics, Guangzhou 510320,China
  • Received:2014-05-05 Online:2014-09-25 Published:2014-09-25

Abstract: High-dimensional and imbalance data is a challenge for data mining. Balanced class distribution hypothesis leads to unsatisfied results of traditional feature selection algorithms on imbalanced data. For solving this problem, a new imbalanced feature selection algorithm IBRFVS, which uses the variable selection mechanism embedded in random forest, is constructed. IBRFVS construct vary decision trees on the balanced sampling data and get the feature importance measurements of individual decision tree by cross validation. The features importance list is decided by the weighted average of the decision tree weights and feature importance measurements, and the decision tree weights is decided by the consistent degree of the individual decision prediction and ensemble prediction. The random forest hyper parameter selection and preprocessing compare experiments on UCI dataset show that the performance of IBRFVS is more stable and prior than traditional feature selection algorithms when hyper parameter K is the square root of feature number, among four empirical parameters.

CLC Number: