Even though numerous kinds of anti-virus software packages have been used for many years, previously unseen malware is still a serious threat to computer and information system. By analyzing portable executable header entries of executables, a malware detection model which consists of four stages: attribute extraction, attribute binarization, attribute elimination, and feature selection and classifier training was carried out in this study. First, we collected header entries from all executables in our dataset and viewed each entry as a potential attribute. Second, information gain and gain ratio were used to binarize numerical and nominal attributes. Next, useless and redundant attributes were eliminated in the third stage. Finally, by using support vector machine which is a classification algorithm of conspicuous generalization ability, feature selection was simultaneously performed with classifier training to reduce the number of attributes and retain the performance of classifier in a cost-effective. We evaluated our model by 1,908 benign programs and 7,863 malicious files (virus, email worm, trojan and backdoor) and estimated its generalization ability by cross validation. The experiment results showed that our model had promising performance for detecting virus and email worm.
B.A. RozenbergEhud GudesYuval EloviciYuval Fledel
B.A. RozenbergEhud GudesYuval EloviciYuval Fledel
B.A. RozenbergEhud GudesYuval EloviciYuval Fledel