To tackle the problem of knowledge acquisition in biological field related to design problem based on natural language, the vector representation model based on natural language is used and a method of obtaining biological field based on text mining is proposed. Based on the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and instance retrieval method of biological domain text are studied while the technical support for design demand driven biological field instance acquisition is provided. The results show that, on the one hand, the method of text mining based on the vector space model has a great advantage in both the precision and the recall rate. On the other hand, the text retrieval mechanism based on vector space has good adaptability and expansibility, which can meet the needs of semantic retrieval in different environments.
SHEN Jian,HU Jie,MA Jin,QI Jin,ZHU Guoniu,PENG Yinghong
. Case Acquisition in Biological Domain Based on Text Mining[J]. Journal of Shanghai Jiaotong University, 2018
, 52(8)
: 954
-960
.
DOI: 10.16183/j.cnki.jsjtu.2018.08.011
[1]SHU L H, CHEONG H. A natural language approach to biomimetic design[M]. London: Springer, 2014: 29-61.
[2]FENG T, CHEONG H, SHU L H. Effects of abstraction on selecting relevant biological phenomena for biomimetic design[J]. Journal of Mechanical Design, 2014, 136(11): 111111.
[3]KAISER M K, HASHEMI F H, LINDEMANN U. BIOscrabble—The role of different types of search terms when searching for biological inspiration in biological research articles[C]∥ Dorian M, Mario S, Neven P, et al. Proceedings of the DESIGN 2014 13th International Design Conference. Cavtat: Design So-ciety, 2014: 241-250.
[4]DENNIS V, PAUL-ARMAND V, SIMON D, et al. SEABIRD: Scalable search for systematic biologically inspired design[J]. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 2016, 30(1): 78-95.
[5]MANNING C, SCHTZE H. Foundations of statistical natural language processing[M]. Cambridge: MIT Press, 1999.
[6]GROSSMAN D A, FRIEDER O. Information retrieval: Algorithms and heuristics[M]. New York: Springer Science & Business Media, 2012.
[7]PATIL C G, PATIL S S. Use of porter stemming algorithm and SVM for emotion extraction from news headlines[J]. International Journal of Electronics, Communication and Soft Computing Science & Engineering (IJECSCSE), 2013, 2(7): 9.
[8]SCHMID H. Improvements in part-of-speech tagging with an application to German[M]. Netherlands: Springer, 1999: 13-25.
[9]GERNER M, NENADIC G, BERGMAN C M. LINNAEUS: A species name identification system for biomedical literature[J]. BMC Bioinformatics, 2010, 11(1): 85.
[10]BANCHS R E. Text mining with MATLAB[M]. New York: Springer Science & Business Media, 2012.
[11]MURPHY J, FU K, OTTO K, et al. Functional based design-by-analogy: A functional vector approach to analogical search[J]. Journal of Mechanical Design, 2014, 136(10): 101102.
[12]FELLBAUM C. WordNet[M]. Netherlands: Springer, 2010: 231-243.
[13]SADAVA D E, HILLIS D M, HELLER H C, et al. Life: The science of biology[M]. tenth edition. New York: Freeman W H & Co. Ltd., 2012.
[14]陈东岳, 陈宗文. 基于特征显著性的均值漂移鲁棒目标跟踪[J]. 上海交通大学学报, 2013, 47(11): 1807-1812.
CHEN Dongyue, CHEN Zongwen. Mean-shift robust object tracking based on feature saliency[J]. Journal of Shanghai Jiao Tong University, 2013, 47(11): 1807-1812.
[15]邓卫卫, 杨慧中. 一种带监督的仿射传播聚类多模型建模方法[J]. 上海交通大学学报, 2011, 45(8): 1172-1175.
DENG Weiwei, YANG Huizhong. A multi-model modeling method based on supervised affinity propagation clustering algorithm[J]. Journal of Shanghai Jiao Tong University, 2011, 45(8): 1172-1175.