上海交通大学学报(自然版) ›› 2018, Vol. 52 ›› Issue (8): 954-960.doi: 10.16183/j.cnki.jsjtu.2018.08.011

• 学报(中文) • 上一篇    下一篇

基于文本挖掘的生物领域实例获取

沈健,胡洁,马进,戚进,朱国牛,彭颖红   

  1. 上海交通大学 机械与动力工程学院, 上海 200240
  • 通讯作者: 胡洁,男,教授,博士生导师,电话(Tel.):021-34206552;E-mail: hujie@sjtu.edu.cn.
  • 作者简介:沈健(1995-),男,江西省九江市人,硕士生,主要从事生物激励设计方面的研究.
  • 基金资助:
    国家自然科学基金资助项目(51475288,51305260,51605302),科技部创新方法专项(2015IM010100)

Case Acquisition in Biological Domain Based on Text Mining

SHEN Jian,HU Jie,MA Jin,QI Jin,ZHU Guoniu,PENG Yinghong   

  1. School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

摘要: 面向以自然语言描述的生物领域实例,基于自然语言的向量表示模型,研究与设计相关的生物领域实例获取问题,提出了基于文本挖掘的生物领域实例获取方法.通过对语料库文本向量空间的构建和知识挖掘,研究生物领域文本的特征选择、相似性度量和实例检索方法,为设计需求驱动的生物领域实例获取提供技术支持.实例分析表明:一方面,基于向量空间模型的生物领域文本挖掘方法在精度和召回率两方面均较基线法具有较大的优势;另一方面,基于向量空间的文本检索机制具有很好的适应性和扩展性,可以满足不同环境下语义检索的需要.

关键词: 文本挖掘, 向量空间模型, 特征选择, 生物激励设计, 知识获取

Abstract: To tackle the problem of knowledge acquisition in biological field related to design problem based on natural language, the vector representation model based on natural language is used and a method of obtaining biological field based on text mining is proposed. Based on the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and instance retrieval method of biological domain text are studied while the technical support for design demand driven biological field instance acquisition is provided. The results show that, on the one hand, the method of text mining based on the vector space model has a great advantage in both the precision and the recall rate. On the other hand, the text retrieval mechanism based on vector space has good adaptability and expansibility, which can meet the needs of semantic retrieval in different environments.

Key words: text mining, vector space model, feature selection, biologically inspired design, knowledge acquisition

中图分类号: