Articles

Web-Based Biomedical Literature Mining

Expand
  • (1. Department of Biomedical Engineering, Basic Medical College, Shanghai Jiaotong University School of Medicine, Shanghai 200025, China; 2. Division of Gastroenterology and Hepatology, Renji Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200001, China; 3. Department of Nuclear Medicine, Renji Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200001, China; 4. Information and Resource Center, Shanghai Jiaotong University School of Medicine, Shanghai 200025, China)

Online published: 2012-11-16

Abstract

With an upsurge in biomedical literature, using data-mining method to search new knowledge from literature has drawing more attention of scholars. In this study, taking the mining of non-coding gene literature from the network database of PubMed as an example, we first preprocessed the abstract data, next applied the term occurrence frequency (TF) and inverse document frequency (IDF) (TF-IDF) method to select features, and then established a biomedical literature data-mining model based on Bayesian algorithm. Finally, we assessed the model through area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, precision rate and recall rate. When 1 000 features are selected, AUC, specificity, sensitivity, accuracy rate, precision rate and recall rate are 0.868 3, 84.63%, 89.02%, 86.83%, 89.02% and 98.14%, respectively. These results indicate that our method can identify the targeted literature related to a particular topic effectively.

Cite this article

AN Jian-fu1,4 (安建福), XUE Hui-ping2 (薛惠平), CHEN ying1 (陈瑛), WU Jian-guo3 (吴建国), ZHANG Lu1 (章鲁) . Web-Based Biomedical Literature Mining[J]. Journal of Shanghai Jiaotong University(Science), 2012 , 17(4) : 494 -499 . DOI: 10.1007/s12204-012-1311-z

References

[1] Gumus E, Kilic N, Sertbas A, et al. Evaluation of face recognition techniques using PCA, wavelets
and SVM [J]. Expert Systems with Applications, 2010, 37(9): 6404-6408.
[2] Barnickel T,Weston J, Collobert R, et al. Large scale application of neural network based semantic
role labeling for automated relation extraction from biomedical texts [J]. PLOS ONE, 2009, 4(7): 1-6.
[3] Kolchinsky A, Abi-Haidar A, Kaur J, et al. Classification of protein-protein interaction full-text documents
using text and citation network features [J].IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010, 7(3): 400-411.
[4] Marcotte E M, Xenarios I, Eisenberg D. Mining literature for protein-protein interactions [J]. Bioinformatics, 2001, 17(4): 359-363.
[5] Jing L P, Ng M K. Prior knowledge based mining functional modules from Yeast PPI networks with gene
ontology [J]. BMC Bioinformatics, 2010, 11(Sup): 53-81.
[6] Ensan L S, Faghankhani M, Javanbakht A, et al.To compare PubMed clinical queries and uptodate in
teaching information mastery to clinical residents: A crossover randomized controlled trial [J]. PLOS ONE,2011, 6(8): 1-7.
[7] Lu Z Y, Kim W, Wilbur W J. Evaluating relevance ranking strategies for MEDLINE retrieval [J]. Journalof the American Medical Informatics Association,2009, 16(1): 32-36.
[8] Chan C L, Ting H W. Constructing a novel mortality prediction model with Bayes theorem and genetic
algorithm [J]. Expert Systems with Applications, 2011,38(7): 7924-7928.
[9] Demler O V, Pencina M J, D’Agostino R B.Equivalence of improvement in area under ROC curve
and linear discriminant analysis coefficient under assumptionof normality [J]. Statistics in Medicine, 2011,30(12): 1410-1418.
[10] Fawcett T. An introduction to ROC analysis [J]. Pattern Recognition Letters, 2006, 27: 861-874.
[11] Cohen G, Hilario M, Sax H, et al. Learning from imbalanced data in surveillance of nosocomial infection
[J]. Artificial Intelligence in Medicine, 2006, 37(1): 7-18
Options
Outlines

/