上海交通大学学报 ›› 2020, Vol. 54 ›› Issue (2): 117-125.doi: 10.16183/j.cnki.jsjtu.2020.02.002
张辉,马少平
发布日期:
2020-03-06
通讯作者:
马少平,男,教授,博士生导师,电话(Tel.):010-62783191;E-mail:msp@tsinghua.edu.cn.
作者简介:
张辉(1981-),女,山东省郓城县人,博士生,主要研究方向为网络用户行为分析、搜索引擎用户接口设计.
基金资助:
ZHANG Hui,MA Shaoping
Published:
2020-03-06
摘要: 信息检索效率在很大程度上取决于用户看到的搜索引擎结果页面所提供的内容.目前,红色突显查询词是商业搜索引擎结果页面主要采用的文本展示方式,但由于查询词可能表达模糊或者包含噪音,与用户的查询意图往往不能完全一致.为了能够充分地反映用户的查询意图,同时突显对于满足用户查询意图最重要的词语,基于人工标注的结果提出一种新的关键词突显策略;综合结构化支持向量机、隐马尔科夫、最大间隔马尔科夫网络及条件随机场4种基础的序列标注机器学习模型,进一步提出一种新的联合序列学习模型并进行用户搜索实验.实验结果表明:该种模型优于4种基础模型,与人工标注的结果相比取得了9330%的准确率;所提出的关键词突显策略明显优于传统的查询词突显策略,提高了用户的满意度及搜索效益.
中图分类号:
张辉,马少平. 基于用户意图的搜索结果文本突显方法[J]. 上海交通大学学报, 2020, 54(2): 117-125.
ZHANG Hui,MA Shaoping. Intent-Aware Search Snippet Text Highlighting Method[J]. Journal of Shanghai Jiaotong University, 2020, 54(2): 117-125.
[1] | 李晓明, 闫宏飞, 王继民. 搜索引擎: 原理、技术与系统[M]. 北京: 科学出版社, 2005: 25-39. |
LI Xiaoming, YAN Hongfei, WANG Jimin. Search engine: Principle, technology and system [M]. Beijing: Science Press, 2005: 25-39. | |
[2] | 罗成, 刘奕群, 张敏, 等. 基于用户意图识别的查询推荐研究[J]. 中文信息学报, 2014, 28(1): 64-72. |
LUO Cheng, LIU Yiqun, ZHANG Min, et al. Query recommendation based on user intent recognition[J]. Journal of Chinese Information Processing, 2014, 28(1): 64-72. | |
[3] | CLARKE C L A, AGICHTEIN E, DUMAIS S, et al. The influence of caption features on click through patterns in web search[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam, the Netherlands: ACM, 2007: 135-142. |
[4] | 余慧佳, 刘奕群, 张敏, 等. 基于大规模日志分析的搜索引擎用户行为分析[J]. 中文信息学报, 2007, 21(1): 109-114. |
YU Huijia, LIU Yiqun, ZHANG Min, et al. Research in search engine user behavior based on log analysis[J]. Journal of Chinese Information Pro-cessing, 2007, 21(1): 109-114. | |
[5] | 魏萍, 周晓林. 从知觉负载理论来理解选择性注意[J]. 心理科学进展, 2005, 13(4): 413-420. |
WEI Ping, ZHOU Xiaolin. The perceptual load theory and selective attention[J]. Advances in Psychological Science, 2005, 13(4): 413-420. | |
[6] | TERSIA G, ROMAN B, MARKKU T. Text highlighting improves user experience for reading with magnified displays[C]//CHI’11 Extended Abstracts on Human Factors in Computing Systems. Vancouver, BC, Canada, 2011: 1891-1896. |
[7] | FEW S. Information dashboard design: The effective visual communication of data [M]. Sebastopol, CA, USA: O’Reilly Media, Inc., 2006: 18-26. |
[8] | FEW S. Now you see it: Simple visualization techniques for quantitative analysis[M]. Oakland, CA, USA: Analytics Press, 2009: 4-12. |
[9] | KICKMEIER M, ALBERT D. The effects of scanability on information search: An online experiment[C]//Proceedings of the 17th British HCI Group Annual Conference. Bath, UK: HCI, 2003: 1-4. |
[10] | 孙晓宁, 朱庆华, 赵宇翔, 等. 社会化搜索研究进展综述[J]. 图书情报工作, 2014, 58(17): 5-13. |
SUN Xiaoning, ZHU Qinghua, ZHAO Yuxiang, et al. Research review on social search[J]. Library and Information Service, 2014, 58(17): 5-13. | |
[11] | 付博, 赵世奇, 刘挺. Web查询日志研究综述[J]. 电子学报, 2013, 40(9): 1800-1808. |
FU Bo, ZHAO Shiqi, LIU Ting. Research on analysis and mining of web query logs[J]. Acta Electronica Sinica, 2013, 40(9): 1800-1808. | |
[12] | MERROUNI Z A, FRIKH B, OUHBI B. Automatic keyphrase extraction: An overview of the state of the art[C]//4th IEEE International Colloquium on Information Science and Technology. Tangier, Morocco: IEEE, 2016: 306-313. |
[13] | FLORESCU C, CARAGEA C. A position-biased PageRank algorithm for keyphrase extraction[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, CA, USA: AAAI, 2017: 4923-4924. |
[14] | LIU Y Q, SONG R H, ZHANG M, et al. Overview of the NTCIR-11 IMine task[C]//11th NTCIR Conference. Tokyo, Japan: NTCIR, 2014: 8-23. |
[15] | NGUYEN N, GUO Y. Comparisons of sequence labeling algorithms and extensions[C]//Proceedings of the 24th International Conference on Machine Learning. Corvalis, OR, USA: ACM, 2007: 681-688. |
[16] | CRAMMER K, SINGER Y. On the algorithmic implementation of multiclass kernel-based vector machines[J]. Journal of Machine Learning Research, 2002, 2(2): 265-292. |
[17] | TAKASU A. Bibliographic attribute extraction from erroneous references based on a statistical model[C]//Joint Conference on Digital Libraries. Houston, TX, USA: IEEE, ACM, 2003: 49-60. |
[18] | PENG F C, MCCALLUM A. Information extraction from research papers using conditional random fields[J]. Information Processing & Management, 2006, 42(4): 963-979. |
[19] | RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1990, 77(2): 267-296. |
[20] | LUO C, ZHENG Y K, LIU Y Q, et al. SogouT-16: A new web corpus to embrace IR research[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Tokyo, Japan: ACM, SIGIR, 2017: 1233-1236 |
[21] | RONG X. Word2vec parameter learning explained [DB/OL].(2014-11-11)[2018-03-15]. https://arxiv.org/abs/1411.2738. |
[22] | ZHANG H, LIU Y Q, MA S P. Sentiment analysis of microblog text based on joint sentiment-topic model[C]//International Conference on Cloud Computing and Intelligence Systems. Shenzhen, China: IEEE, 2014: 46-54. |
[23] | JIANG J, AWADALLAH A H, SHI X, et al. Understanding and predicting graded search satisfaction[C]//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. Shanghai, China: ACM, 2015: 57-66. |
[24] | JIANG J P, HE D Q, ALLAN J. Searching, browsing, and clicking in a search session: Changes in user behavior by task and over time[C]//Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Gold Coast, Queensland, Australia: ACM, 2014: 607-616. |
[1] | 宋人杰1,余通1,陈宇红2,陈宇阳2,夏滨2. 基于MapReduce模型的大数据相似重复记录检测算法[J]. 上海交通大学学报(自然版), 2018, 52(2): 214-221. |
[2] | 袁松翔, 刘功申. 基于译文特征的中英文跨语种抄袭识别[J]. 上海交通大学学报(自然版), 2012, 46(06): 989-993. |
[3] | 张晔,贾雨葶,傅洛伊,王新兵. AceMap学术地图与AceKG学术知识图谱——学术数据可视化[J]. 上海交通大学学报(自然版), 2018, 52(10): 1357-1362. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||