Research on Classification of Malware Source Code

CHEN Chia-mei1 (陈嘉玫); LAI Gu-hsin2* (赖谷鑫)

doi:10.1007/s12204-014-1519-1

Journal of Shanghai Jiaotong University(Science) >

2014 , Vol. 19 >Issue 4: 425 - 430

DOI: https://doi.org/10.1007/s12204-014-1519-1

Research on Classification of Malware Source Code

CHEN Chia-mei1 (陈嘉玫) ,
LAI Gu-hsin2* (赖谷鑫)

Expand

(1. Department of Information Management, National Sun Yat-Sen University, Kaohsiung 804, Taiwan, China; 2. Department of Information Management, Chinese Culture University, Taipei 111, China)

Online published: 2014-10-13

Fold

Abstract

In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

Key words： malware; source code classification; static analysis; honeypot

Cite this article

CHEN Chia-mei1 (陈嘉玫) , LAI Gu-hsin2* (赖谷鑫) . Research on Classification of Malware Source Code[J]. Journal of Shanghai Jiaotong University(Science), 2014 , 19(4) : 425 -430 . DOI: 10.1007/s12204-014-1519-1

References

[1] Jain S, Meena Y K. Byte level n-gram analysis for malware detection [M]. Berlin: Springer Heidelberg,2011: 51-59.
[2] Kolter J Z, Maloof M A. Learning to detect and classify malicious executables in the wild [J]. Journal of Machine Learning Research, 2006, 7: 2721-2744.
[3] Tahan G, Rokach L, Shahar Y. Mal-ID: Automatic malware detection using common segment analysis and meta-features [J]. Journal of Machine Learning Research,2012, 13: 949-979.
[4] Zhang B, Yin J, Hao J, et al. Malicious codes detection based on ensemble learning [J]. Lecture Notes in Computer Science, 2007, 4610: 468-477.
[5] Ye Y, Wang D, Li T, et al. An intelligent pemalware detection system based on association mining [J]. Journal in Computer Virology, 2008, 4(4): 323-334.
[6] Ye Y, Chen L, Wang D, et al. Sbmds: an interpretable string based malware detection system using SVM ensemble with bagging [J]. Journal in Computer Virology, 2009, 5(4): 283-293.
[7] Ye Y, Li T, Wang D, et al. Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list [J]. Journal of Intelligent Information Systems, 2010, 35(1): 1-20.
[8] Cesare S, Xiang Y. Classification of malware using structured control flow [C]//Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010). Darlinghurst, Australia:Australian Computer Society, 2010: 61-70.
[9] Cesare S, Xiang Y, Zhou W. Malwise—An effective and efficient classification system for packed and polymorphic malware [J]. IEEE Transactions on Computers,2013, 62(6): 1193-1206.
[10] Gheorghescu M. An automated virus classification system [C]// Virus Bulletin Conference. Dublin, Ireland:Virus Bulletin, 2005: 294-300.
[11] Rieck K, Trinius P, Willems C, et al. Automatic analysis of malware behavior using machine learning [J]. Journal of Computer Security, 2011, 19(4): 639-668.
[12] Willems C, Holz T, Freiling F. Toward automated dynamic malware analysis using CWSandbox[J]. IEEE Security and Privacy, 2007, 2(5): 32-39.
[13] Zhang J, Porras P, Yegneswaran V. Host-rx: Automated malware diagnosis based on probabilistic behavior models [R]. California, USA: SRI International,2009.
[14] Zhao H, Xu M, Zheng N, et al. Malicious executables classification based on behavioral factor analysis[C]//Proceedings of International Conference on e-Education, e-Business, e-Management and e-Learning.Washington, USA: IEEE Computer Society, 2010:502-506.
[15] Lutz P, Guido M, Michael P. JPlag: Finding plagiarisms among a set of programs with JPlag [J]. Journal of Universal Computer Science, 2002, 8(11): 1016-1038.
[16] Cosma G, Joy M. An approach to source-code plagiarism detection and investigation using latent semantic analysis [J]. IEEE Transactions on Computers, 2012,61(3): 379-394.
[17] Rokach L, Romano R, Maimon O. Negation recognition in medical narrative reports [J]. Information Retrieval,2008, 11(6): 499-538.
[18] Bloom B H. Space/time trade-offs in hash coding with allowable errors [J]. Communications of the ACM,1970, 13(7): 422-426.
[19] Gitchell D, Tran N. Sim: A utility for detecting similarity in computer programs [C]//Proceedings of the 30th SIGCSE Technical Symposium. New York,USA: ACM, 1999: 266-270.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References