Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21]


Journal of Information Science and Engineering, Vol. 31 No. 3, pp. 965-992 (May 2015)

Feature Selection and Extraction for Malware Classification

1Department of Electrical Engineering
National Taiwan University of Science and Technology
Taipei, 106 Taiwan
E-mail: {d9507932; njwang}
2Chair for IT Security, Institute of Informatics
Technischen Universitat Munchen
Garching, 85748 Germany
E-mail: {xiaoh; claudia.eckert}

The explosive amount of malware continues their threats in network and operating systems. Signature-based method is widely used for detecting malware. Unfortunately, it is unable to determine variant malware on-the-fly. On the hand, behavior-based method can effectively characterize the behaviors of malware. However, it is time-consuming to train and predict for each specific family of malware. We propose a generic and efficient algorithm to classify malware. Our method combines the selection and the extraction of features, which significantly reduces the dimensionality of features for training and classification. Based on malware behaviors collected from a sandbox environment, our method proceeds in five steps: (a) extracting n-gram feature space data from behavior logs; (b) building a support vector machine (SVM) classifier for malware classification; (c) selecting a subset of features; (d) transforming high-dimensional feature vectors into low-dimensional feature vectors; and (e) selecting models. Experiments were conducted on a real-world data set with 4,288 samples from 9 families, which demonstrated the effectiveness and the efficiency of our approach.

Keywords: dynamic malware analysis, data classification, dimensionality reduction, term frequency inverse document frequency, principal component analysis, kernel principal component analysis, support vector machine

Full Text () Retrieve PDF document (201505_11.pdf)

Received November 12, 2013; revised June 11 & September 1, 2014; accepted October 11, 2014.
Communicated by Shou-De Lin.