上海交通大学学报(英文版) ›› 2014, Vol. 19 ›› Issue (4): 425-430.doi: 10.1007/s12204-014-1519-1

• • 上一篇    下一篇

Research on Classification of Malware Source Code

CHEN Chia-mei1 (陈嘉玫), LAI Gu-hsin2* (赖谷鑫)   

  1. (1. Department of Information Management, National Sun Yat-Sen University, Kaohsiung 804, Taiwan, China; 2. Department of Information Management, Chinese Culture University, Taipei 111, China)
  • 出版日期:2014-08-30 发布日期:2014-10-13
  • 通讯作者: LAI Gu-hsin(赖谷鑫) E-mail: guhsinlai@gmail.com

Research on Classification of Malware Source Code

CHEN Chia-mei1 (陈嘉玫), LAI Gu-hsin2* (赖谷鑫)   

  1. (1. Department of Information Management, National Sun Yat-Sen University, Kaohsiung 804, Taiwan, China; 2. Department of Information Management, Chinese Culture University, Taipei 111, China)
  • Online:2014-08-30 Published:2014-10-13
  • Contact: LAI Gu-hsin(赖谷鑫) E-mail: guhsinlai@gmail.com

摘要: In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

关键词: malware, source code classification, static analysis, honeypot

Abstract: In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

Key words: malware, source code classification, static analysis, honeypot

中图分类号: