上海交通大学学报(自然版) ›› 2012, Vol. 46 ›› Issue (06): 989-993.

• 自动化技术、计算机技术 • 上一篇    下一篇

基于译文特征的中英文跨语种抄袭识别

袁松翔, 刘功申   

  1. (上海交通大学 信息安全与工程学院舆情分析研究中心, 上海 200240)
  • 收稿日期:2011-09-04 出版日期:2012-06-28 发布日期:2012-06-28
  • 基金资助:

    教育部科技论文快速共享专项研究课题(2010121),国家高技术研究发展计划(863)项目(2010AA012505)

Research of Ch-En Cross-Lingual Plagiarism Detection  Based on Translation Features

 YUAN  Song-Xiang, LIU  Gong-Shen   

  1. (Public Opinion Research Center, School of Information Security Engineering, Shanghai Jiaotong University, Shanghai 200240, China)
  • Received:2011-09-04 Online:2012-06-28 Published:2012-06-28

摘要: 针对科技类学术论文的跨语种反抄袭识别问题,以中英跨语种抄袭的识别为目标展开了研究,用于探讨进行跨语种抄袭识别的方法.通过挖掘中文译文的内在规律找到了一组可以表明译文风格的译文特征,并通过这些译文特征和决策树算法识别出存在抄袭嫌疑的科技论文.试验系统开放测试的准确率和召回率分别到达了88.68%和79.17%. 

关键词: 论文抄袭, 译文特征, 跨语种, 决策树

Abstract: Research on anti-plagiarism detection of scientific papers in single language has acquired relevance and a number of practical systems have been developed. However, the relevant study and achievement are relatively few in cross-lingual anti-plagiarism. Targeting at scientific papers, this paper discussed the implementation of Chinese-English cross-lingual plagiarism detection. The paper locates a set of translation features by digging internal laws of Chinese translation. Through these features, papers which are suspected of plagiarism can be identified by the decision tree algorithm. In open test, its recalling rate-achieves 88.68% and the precision rate 79.17%.
 

Key words: paper plagiarism, translation feature, cross-language, decision tree

中图分类号: