Journal of Shanghai Jiaotong University ›› 2018, Vol. 52 ›› Issue (10): 1363-1369.doi: 10.16183/j.cnki.jsjtu.2018.10.027
Previous Articles Next Articles
LI Tongyu,REN Rui,CAI Hongming,JIANG Lihong
CLC Number:
LI Tongyu,REN Rui,CAI Hongming,JIANG Lihong. Automated Web Page Content Extraction Method Based on Document Object Model[J]. Journal of Shanghai Jiaotong University, 2018, 52(10): 1363-1369.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xuebao.sjtu.edu.cn/EN/10.16183/j.cnki.jsjtu.2018.10.027
[1]WENINGER T, PALACIOS R, CRESCENZI V, et al. Web content extraction: A meta-analysis of its past and thoughts on its future[J]. ACM SIGKDD Explorations Newsletter, 2016, 17(2): 17-23. [2]BORGOLTE K, KRUEGEL C, VIGNA G. Relevant change detection: A framework for the precise extraction of modified and novel web-based content as a filtering technique for analysis engines[C]//Proceedings of the 23rd International Conference on World Wide Web. Seoul: ACM, 2014: 595-598. [3]PETPRASIT W, JAIYEN S. E-commerce web page classification based on automatic content extraction[C]//2015 12th International Joint Conference on Computer Science and Software Engineering (JCSSE). Songkhla: IEEE, 2015: 74-77. [4]KADAM V, DEVALE P R. A methodology for template extraction from heterogeneous web pages[J]. Indian Journal of Computer Science and Engineering (IJCSE), 2012, 3(3): 449-452. [5]WU S, LIU J, FAN J. Automatic web content extraction by combination of learning and grouping[C]//Proceedings of the 24th international conference on World Wide Web. Switzerland: International World Wide Web Conferences Steering Committee, 2015: 1264-1274. [6]KIM M, KIM Y, SONG W, et al. Main content extraction from Web documents using text block context[C]//International Conference on Database and Expert Systems Applications, Prague. Berlin, Heidelberg: Springer, 2013: 81-93. [7]REIS D C, GOLGHER P B, SILVA A S, et al. Automatic web news extraction using tree edit distance[C]//Proceedings of the 13th international conference on World Wide Web. New York: ACM, 2004: 502-511. [8]杨柳青, 李晓东, 耿光刚.基于布局相似性的网页正文内容提取研究[J].计算机应用研究, 2015, 32(9): 2581-2586. YANG Liuqing, LI Xiaodong, GENG Guanggang. Study of web pages content extraction based on layout similarity[J]. Application Research of Computers, 2015, 32(9): 2581-2586. [9]CAI D, YU S, WEN J R, et al. VIPS: A vision-based page segmentation algorithm[R]. Beijing: Microsoft, 2003. [10]WANG P, ZHOU M, YOU Y, et al. A new vision-based method for extracting academic information from conference Web pages[C]//IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI). Athens: IEEE, 2012: 976-981. [11]SUN F, SONG D, LIAO L. Dom based content extraction via text density[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing: ACM, 2011: 245-254. [12]FU L, MENG Y, XIA Y, et al. Web content extraction based on webpage layout analysis[C]//Second International Conference on Information Technology and Computer Science (ITCS). Kiev: IEEE, 2010: 40-43. [13]WENINGER T, HSU W H. Text extraction from the web via text-to-tag ratio[C]//19th International Workshop on Database and Expert Systems Application (DEXA). Turin: IEEE, 2008: 23-28. [14]ZHENG X, GU Y, LI Y. Data extraction from web pages based on structural-semantic entropy[C]//Proceedings of the 21st International Conference on World Wide Web. Lyon: ACM, 2012: 93-102. [15]LIU Q, SHAO M, WU L, et al. Main content extraction from web pages based on node characteristics[J]. Journal of Computing Science and Engineering, 2017, 11(2): 39-48. |
[1] | QIAN Peng, WANG Guoliang, ZHU Wenfeng. Modeling and Optimization of 3D Assembly Tolerance for Window Lifting Under Flexible Deformation [J]. Journal of Shanghai Jiaotong University, 2020, 54(11): 1134-1141. |
[2] | BAO Qinglin, CHAI Huaqi, ZHAO Songzheng, WANG Jilin. Model of Technology Opportunity Mining Using Machine Learning Algorithm and Its Application [J]. Journal of Shanghai Jiaotong University, 2020, 54(7): 705-717. |
[3] | LI Baihe, JIANG Zuhua, TAO Ningrong, MENG Lingtong, ZHENG Hong. Ship Block Transportation Scheduling Considering Cooperative Transportation of Flatcars [J]. Journal of Shanghai Jiaotong University, 2020, 54(7): 718-727. |
[4] | MA Zhonghang, ZHANG Zhinan. Design and Realization of a Versatile Simulation Platform for Telecontrol Multi-Rotor Unmanned Aerial Vehicle with a Robotic Arm [J]. Journal of Shanghai Jiaotong University, 2020, 54(6): 636-642. |
[5] | MENG Lingtong, JIANG Zuhua, TAO Ningrong, LIU Jianfeng, ZHENG Hong. Multi-Stockyard Scheduling Considering Technological Process and Combined Assembly Block [J]. Journal of Shanghai Jiao Tong University, 2020, 54(4): 331-343. |
[6] | ZHANG Jie,ZHAO Xinming,ZHANG Peng,SHENG Xia,CHAO Xiaona,TIAN Fengxiang. Early Warning Method for Tardiness Precaution Oriented to Rocket Final Assembly Process [J]. Journal of Shanghai Jiaotong University, 2020, 54(3): 322-330. |
[7] | SUN Mingyang,YAN Guozheng,LIU Dasheng,WANG Zhiwu,HAN Ding,ZHAO Kai,YANG Lei. High Accuracy Ultra Wideband Real Time Location System for Drug Rehabilitation Center [J]. Journal of Shanghai Jiaotong University, 2020, 54(1): 76-84. |
[8] | ZHANG Yungang,YANG Jianfeng,YI Benshun. Improved Residual Encoder-Decoder Network for Low-Dose CT Image Denoising [J]. Journal of Shanghai Jiaotong University, 2019, 53(8): 983-989. |
[9] | WANG Hongyu,YIN Wurong,WANG Liang,HU Jianghao,QIAO Wenchao. Fast Edge Extraction Algorithm Based on HSV Color Space [J]. Journal of Shanghai Jiaotong University, 2019, 53(7): 765-772. |
[10] | ZHOU Binghai,LIU Wenlong. Multi-Objective Hybrid Flow-Shop Scheduling Problem Considering Energy Consumption and On-Time Delivery [J]. Journal of Shanghai Jiaotong University, 2019, 53(7): 773-779. |
[11] | MENG Lingtong,JIANG Zuhua,TAO Ningrong,LIU Jianfeng,LI Baihe. Combined Assembly Block Scheduling in Storage Yard of Shipbuilding [J]. Journal of Shanghai Jiaotong University, 2019, 53(7): 780-788. |
[12] | JIANG Xudong,LI Pengfei,LIU Zheng,TENG Xiaoyan. Arterial Injury Assessment by Computational Interaction Model of Shear Thinning Blood with Expanded Stenotic Vascular [J]. Journal of Shanghai Jiaotong University, 2019, 53(6): 757-764. |
[13] | TANG Ran,ZHAO Yingxin,WU Hong. Automatic Identification System Signal Detection Algorithm Based on Improved Feedback Decision [J]. Journal of Shanghai Jiaotong University, 2019, 53(5): 610-615. |
[14] | YE Xian,HU Jie,TIAN Pan,QI Jin,CHE Datian,DING Ying. Automatic Sleep Scoring Based on Refined Composite Multi-Scale Entropy and Support Vector Machine [J]. Journal of Shanghai Jiaotong University, 2019, 53(3): 321-326. |
[15] | SHEN Ting,SUN Tanfeng,JIANG Xinghao. Detection of Double Compression with the Same Quantization Parameter Based on Dual Encoding Parameter Model [J]. Journal of Shanghai Jiaotong University, 2019, 53(3): 334-340. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||