Question answering systems offer a friendly interface for human beings to interact with massive online
information. It is time consuming for users to retrieve useful medical information with search engines among
massive online websites. An effort is made to build a Chinese Question Answering System in Medical Domain
(CQASMD) to provide useful medical information for users. A large medical knowledge base with more than 300
thousand medical terms and their descriptions is firstly constructed to store the structured medical knowledge
data, and classified with the FastText model. Furthermore, a Word2Vec model is adopted to capture the semantic
meanings of words, and the questions and answers are processed with sentence embedding to capture semantic
context information. Users’ questions are firstly classified and processed into a sentence vector and a matching
algorithm is adopted to match the most similar question. After querying the constructed medical knowledge
base, the corresponding answers to previous questions are responded to users. The architecture and flowchart of
CQASMD is proposed, which will play an important role in self disease diagnosis and treatment.
FENG Guofei (冯郭飞), DU Zhikang (杜智康), WU Xing (武星)
. A Chinese Question Answering System in Medical Domain[J]. Journal of Shanghai Jiaotong University(Science), 2018
, 23(5)
: 678
-683
.
DOI: 10.1007/s12204-018-1982-1
[1] HAZRINA S, SHAREF N M, IBRAHIM H, et al. Reviewon the advancements of disambiguation in semanticquestion answering system [J]. Information Processingand Management, 2017, 53(1): 52-69.
[2] ALLAM A M N, HAGGAG M H. The question answeringsystems: A survey [J]. International Journal ofResearch and Reviews in Information Sciences, 2012,2(3): 1-12.
[3] SOCHER R, BENGIO Y, MANNING C D. Deeplearning for NLP (without magic) [C]//Tutorial Abstractsof ACL 2012. Jeju, Korea: ACL, 2012: 5-5.
[4] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521: 436-444.
[5] HANBURY A. Medical information retrieval: An instanceof domain-specific search [C]//Proceedings ofthe 35th International ACM SIGIR Conference onResearch and Development in Information Retrieval.Portland, OR, USA: ACM, 2012: 1191-1192.
[6] JOULIN A, GRAVE E, BOJANOWSKI P,et al. Bag of tricks for efficient text classification[EB/OL]. (2016-08-09). [2018-04-18].https://arxiv.org/pdf/1607.01759.pdf.
[7] BOJANOWSKI P, GRAVE E, JOULIN A, etal. Enriching word vectors with subword information[EB/OL]. (2017-06-19).[2018-04-18].https://arxiv.org/pdf/1607.04606.pdf.
[8] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributedrepresentations of words and phrases andtheir compositionality [EB/OL]. (2013-10-16). [2018-04-18]. https://arxiv.org/pdf/1310.4546.pdf.
[9] MIKOLOV T, CHEN K, CORRADO G, et al.Efficient estimation of word representations invector space [EB/OL]. (2013-09-07). [2018-04-18].https://arxiv.org/pdf/1301.3781v3.pdf.
[10] IYYERM, MANJUNATHA V, BOYD-GRABER J, etal. Deep unordered composition rivals syntactic methodsfor text classification [C]//Proceedings of the 53rdAnnual Meeting of the Association for ComputationalLinguistics. Beijing, China: ACL, 2015: 1681-1691.
[11] WANG S, MANNING C D. Baselines and bigrams:Simple, good sentiment and topic classification[C]//Proceedings of the 50th Annual Meeting of the Associationfor Computational Linguistics. Stroudsburg,PA, USA: ACL, 2012: 90-94.