Anti-virus vendors receive hundreds of thousands of malware to be analysed each day. Some are new malware while others are variations or evolutions of existing malware. Because analyzing each malware sample by hand is impossible, automated techniques to analyse and categorize incoming samples are needed. In this work, we explore various machine learning features extracted from malware samples through static analysis for classification of malware binaries into already known malware families. We present a new feature based on control statement shingling that has a comparable accuracy to ordinary opcode n-gram based features while requiring smaller dimensions. This, in turn, results in a shorter training time.
Dikshyant DhunganaA. SapkotaS. PokharelSudarshan DevkotaBishnu Hari Paudel
S AshwiniManisha PaiJ. Sangeetha
Rafiqul IslamRonghua TianLynn BattenSteve Versteeg