Journal of shanghai Jiaotong University (Science) ›› 2014, Vol. 19 ›› Issue (4): 425-430.doi: 10.1007/s12204-014-1519-1

Previous Articles     Next Articles

Research on Classification of Malware Source Code

Research on Classification of Malware Source Code

CHEN Chia-mei1 (陈嘉玫), LAI Gu-hsin2* (赖谷鑫)   

  1. (1. Department of Information Management, National Sun Yat-Sen University, Kaohsiung 804, Taiwan, China; 2. Department of Information Management, Chinese Culture University, Taipei 111, China)
  2. (1. Department of Information Management, National Sun Yat-Sen University, Kaohsiung 804, Taiwan, China; 2. Department of Information Management, Chinese Culture University, Taipei 111, China)
  • Online:2014-08-30 Published:2014-10-13
  • Contact: LAI Gu-hsin(赖谷鑫) E-mail: guhsinlai@gmail.com

Abstract: In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

Key words: malware| source code classification| static analysis| honeypot

摘要: In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

关键词: malware| source code classification| static analysis| honeypot

CLC Number: