›› 2014, Vol. 53 ›› Issue (5): 59-65.
YIN Hua,HU Yuping
High-dimensional and imbalance data is a challenge for data mining. Balanced class distribution hypothesis leads to unsatisfied results of traditional feature selection algorithms on imbalanced data. For solving this problem, a new imbalanced feature selection algorithm IBRFVS, which uses the variable selection mechanism embedded in random forest, is constructed. IBRFVS construct vary decision trees on the balanced sampling data and get the feature importance measurements of individual decision tree by cross validation. The features importance list is decided by the weighted average of the decision tree weights and feature importance measurements, and the decision tree weights is decided by the consistent degree of the individual decision prediction and ensemble prediction. The random forest hyper parameter selection and preprocessing compare experiments on UCI dataset show that the performance of IBRFVS is more stable and prior than traditional feature selection algorithms when hyper parameter K is the square root of feature number, among four empirical parameters.
YIN Hua,HU Yuping. An Imbalanced Feature Selection Algorithm Based on Random Forest[J]. , 2014, 53(5): 59-65.
Add to citation manager EndNote|Ris|BibTeX