学报(中文)

基于文本挖掘的生物领域实例获取

展开
  • 上海交通大学 机械与动力工程学院, 上海 200240
沈健(1995-),男,江西省九江市人,硕士生,主要从事生物激励设计方面的研究.

基金资助

国家自然科学基金资助项目(51475288,51305260,51605302),科技部创新方法专项(2015IM010100)

Case Acquisition in Biological Domain Based on Text Mining

Expand
  • School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

摘要

面向以自然语言描述的生物领域实例,基于自然语言的向量表示模型,研究与设计相关的生物领域实例获取问题,提出了基于文本挖掘的生物领域实例获取方法.通过对语料库文本向量空间的构建和知识挖掘,研究生物领域文本的特征选择、相似性度量和实例检索方法,为设计需求驱动的生物领域实例获取提供技术支持.实例分析表明:一方面,基于向量空间模型的生物领域文本挖掘方法在精度和召回率两方面均较基线法具有较大的优势;另一方面,基于向量空间的文本检索机制具有很好的适应性和扩展性,可以满足不同环境下语义检索的需要.

本文引用格式

沈健,胡洁,马进,戚进,朱国牛,彭颖红 . 基于文本挖掘的生物领域实例获取[J]. 上海交通大学学报, 2018 , 52(8) : 954 -960 . DOI: 10.16183/j.cnki.jsjtu.2018.08.011

Abstract

To tackle the problem of knowledge acquisition in biological field related to design problem based on natural language, the vector representation model based on natural language is used and a method of obtaining biological field based on text mining is proposed. Based on the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and instance retrieval method of biological domain text are studied while the technical support for design demand driven biological field instance acquisition is provided. The results show that, on the one hand, the method of text mining based on the vector space model has a great advantage in both the precision and the recall rate. On the other hand, the text retrieval mechanism based on vector space has good adaptability and expansibility, which can meet the needs of semantic retrieval in different environments.

参考文献

[1]SHU L H, CHEONG H. A natural language approach to biomimetic design[M]. London: Springer, 2014: 29-61. [2]FENG T, CHEONG H, SHU L H. Effects of abstraction on selecting relevant biological phenomena for biomimetic design[J]. Journal of Mechanical Design, 2014, 136(11): 111111. [3]KAISER M K, HASHEMI F H, LINDEMANN U. BIOscrabble—The role of different types of search terms when searching for biological inspiration in biological research articles[C]∥ Dorian M, Mario S, Neven P, et al. Proceedings of the DESIGN 2014 13th International Design Conference. Cavtat: Design So-ciety, 2014: 241-250. [4]DENNIS V, PAUL-ARMAND V, SIMON D, et al. SEABIRD: Scalable search for systematic biologically inspired design[J]. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 2016, 30(1): 78-95. [5]MANNING C, SCHTZE H. Foundations of statistical natural language processing[M]. Cambridge: MIT Press, 1999. [6]GROSSMAN D A, FRIEDER O. Information retrieval: Algorithms and heuristics[M]. New York: Springer Science & Business Media, 2012. [7]PATIL C G, PATIL S S. Use of porter stemming algorithm and SVM for emotion extraction from news headlines[J]. International Journal of Electronics, Communication and Soft Computing Science & Engineering (IJECSCSE), 2013, 2(7): 9. [8]SCHMID H. Improvements in part-of-speech tagging with an application to German[M]. Netherlands: Springer, 1999: 13-25. [9]GERNER M, NENADIC G, BERGMAN C M. LINNAEUS: A species name identification system for biomedical literature[J]. BMC Bioinformatics, 2010, 11(1): 85. [10]BANCHS R E. Text mining with MATLAB[M]. New York: Springer Science & Business Media, 2012. [11]MURPHY J, FU K, OTTO K, et al. Functional based design-by-analogy: A functional vector approach to analogical search[J]. Journal of Mechanical Design, 2014, 136(10): 101102. [12]FELLBAUM C. WordNet[M]. Netherlands: Springer, 2010: 231-243. [13]SADAVA D E, HILLIS D M, HELLER H C, et al. Life: The science of biology[M]. tenth edition. New York: Freeman W H & Co. Ltd., 2012. [14]陈东岳, 陈宗文. 基于特征显著性的均值漂移鲁棒目标跟踪[J]. 上海交通大学学报, 2013, 47(11): 1807-1812. CHEN Dongyue, CHEN Zongwen. Mean-shift robust object tracking based on feature saliency[J]. Journal of Shanghai Jiao Tong University, 2013, 47(11): 1807-1812. [15]邓卫卫, 杨慧中. 一种带监督的仿射传播聚类多模型建模方法[J]. 上海交通大学学报, 2011, 45(8): 1172-1175. DENG Weiwei, YANG Huizhong. A multi-model modeling method based on supervised affinity propagation clustering algorithm[J]. Journal of Shanghai Jiao Tong University, 2011, 45(8): 1172-1175.
Options
文章导航

/