|
Web-Based Biomedical Literature Mining
AN Jian-fu1,4 (安建福), XUE Hui-ping2 (薛惠平), CHEN ying1 (陈瑛), WU Jian-guo3 (吴建国), ZHANG Lu1 (章鲁)
2012, 17 (4):
494-499.
doi: 10.1007/s12204-012-1311-z
With an upsurge in biomedical literature, using data-mining method to search new knowledge from
literature has drawing more attention of scholars. In this study, taking the mining of non-coding gene literature
from the network database of PubMed as an example, we first preprocessed the abstract data, next applied the
term occurrence frequency (TF) and inverse document frequency (IDF) (TF-IDF) method to select features, and
then established a biomedical literature data-mining model based on Bayesian algorithm. Finally, we assessed
the model through area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity,
precision rate and recall rate. When 1 000 features are selected, AUC, specificity, sensitivity, accuracy rate,
precision rate and recall rate are 0.868 3, 84.63%, 89.02%, 86.83%, 89.02% and 98.14%, respectively. These results
indicate that our method can identify the targeted literature related to a particular topic effectively.
References |
Related Articles |
Metrics
|