Intent-Aware Search Snippet Text Highlighting Method

ZHANG Hui,MA Shaoping

doi:10.16183/j.cnki.jsjtu.2020.02.002

Journal of Shanghai Jiaotong University >

2020 , Vol. 54 >Issue 2: 117 - 125

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2020.02.002

Intent-Aware Search Snippet Text Highlighting Method

Expand

Department of Computer Science and Technology; State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China

Online published: 2020-03-06

Fold

Abstract

The efficiency of information retrieval from web depends largely on the search engine results page (SERP) that obtained by searchers, especially the highlighting text. At present, the SERP of commercial search engines usually uses query terms highlighting strategy. However, the query words can be ambiguous and even contain noise, which may be incompletely consistent with the search intention of users. In order to highlight the most important terms that describe the search information clearly, this paper proposes a new key term highlighting strategy based on the results of manual annotation. Then this paper generates highlighting terms based on four machine learning algorithms, including structured support vector machine, hidden Markov model, max-margin Markov networks and conditional random field algorithm. In addition, this paper also proposes a new method which called the joint sequence labeling (JSL) algorithm to combine these four structured learning algorithms. Moreover, this paper conducts search experiments by using JSL algorithm. Experimental results show that the JSL algorithm provides more accurate solutions compared with the baselines and its search accuracy achieves 9330%. And the results of search experiments show that the key term highlighting strategy achieves better performance and users’satisfactory than traditional query terms highlighting strategy.

Key words： search engine results page (SERP); intent of user; query terms highlighting; joint sequence labeling (JSL) algorithm

Cite this article

ZHANG Hui,MA Shaoping . Intent-Aware Search Snippet Text Highlighting Method[J]. Journal of Shanghai Jiaotong University, 2020 , 54(2) : 117 -125 . DOI: 10.16183/j.cnki.jsjtu.2020.02.002

References

［1］李晓明, 闫宏飞, 王继民. 搜索引擎: 原理、技术与系统［M］. 北京: 科学出版社, 2005: 25-39. LI Xiaoming, YAN Hongfei, WANG Jimin. Search engine: Principle, technology and system ［M］. Beijing: Science Press, 2005: 25-39. ［2］罗成, 刘奕群, 张敏, 等. 基于用户意图识别的查询推荐研究［J］. 中文信息学报, 2014, 28(1): 64-72. LUO Cheng, LIU Yiqun, ZHANG Min, et al. Query recommendation based on user intent recognition［J］. Journal of Chinese Information Processing, 2014, 28(1): 64-72. ［3］CLARKE C L A, AGICHTEIN E, DUMAIS S, et al. The influence of caption features on click through patterns in web search［C］//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam, the Netherlands: ACM, 2007: 135-142. ［4］余慧佳, 刘奕群, 张敏, 等. 基于大规模日志分析的搜索引擎用户行为分析［J］. 中文信息学报, 2007, 21(1): 109-114. YU Huijia, LIU Yiqun, ZHANG Min, et al. Research in search engine user behavior based on log analysis［J］. Journal of Chinese Information Pro-cessing, 2007, 21(1): 109-114. ［5］魏萍, 周晓林. 从知觉负载理论来理解选择性注意［J］. 心理科学进展, 2005, 13(4): 413-420. WEI Ping, ZHOU Xiaolin. The perceptual load theory and selective attention［J］. Advances in Psychological Science, 2005, 13(4): 413-420. ［6］TERSIA G, ROMAN B, MARKKU T. Text highlighting improves user experience for reading with magnified displays［C］//CHI’11 Extended Abstracts on Human Factors in Computing Systems. Vancouver, BC, Canada, 2011: 1891-1896. ［7］FEW S. Information dashboard design: The effective visual communication of data ［M］. Sebastopol, CA, USA: O’Reilly Media, Inc., 2006: 18-26. ［8］FEW S. Now you see it: Simple visualization techniques for quantitative analysis［M］. Oakland, CA, USA: Analytics Press, 2009: 4-12. ［9］KICKMEIER M, ALBERT D. The effects of scanability on information search: An online experiment［C］//Proceedings of the 17th British HCI Group Annual Conference. Bath, UK: HCI, 2003: 1-4. ［10］孙晓宁, 朱庆华, 赵宇翔, 等. 社会化搜索研究进展综述［J］. 图书情报工作, 2014, 58(17): 5-13. SUN Xiaoning, ZHU Qinghua, ZHAO Yuxiang, et al. Research review on social search［J］. Library and Information Service, 2014, 58(17): 5-13. ［11］付博, 赵世奇, 刘挺. Web查询日志研究综述［J］. 电子学报, 2013, 40(9): 1800-1808. FU Bo, ZHAO Shiqi, LIU Ting. Research on analysis and mining of web query logs［J］. Acta Electronica Sinica, 2013, 40(9): 1800-1808. ［12］MERROUNI Z A, FRIKH B, OUHBI B. Automatic keyphrase extraction: An overview of the state of the art［C］//4th IEEE International Colloquium on Information Science and Technology. Tangier, Morocco: IEEE, 2016: 306-313. ［13］FLORESCU C, CARAGEA C. A position-biased PageRank algorithm for keyphrase extraction［C］//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, CA, USA: AAAI, 2017: 4923-4924. ［14］LIU Y Q, SONG R H, ZHANG M, et al. Overview of the NTCIR-11 IMine task［C］//11th NTCIR Conference. Tokyo, Japan: NTCIR, 2014: 8-23. ［15］NGUYEN N, GUO Y. Comparisons of sequence labeling algorithms and extensions［C］//Proceedings of the 24th International Conference on Machine Learning. Corvalis, OR, USA: ACM, 2007: 681-688. ［16］CRAMMER K, SINGER Y. On the algorithmic implementation of multiclass kernel-based vector machines［J］. Journal of Machine Learning Research, 2002, 2(2): 265-292. ［17］TAKASU A. Bibliographic attribute extraction from erroneous references based on a statistical model［C］//Joint Conference on Digital Libraries. Houston, TX, USA: IEEE, ACM, 2003: 49-60. ［18］PENG F C, MCCALLUM A. Information extraction from research papers using conditional random fields［J］. Information Processing & Management, 2006, 42(4): 963-979. ［19］RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition［J］. Proceedings of the IEEE, 1990, 77(2): 267-296. ［20］LUO C, ZHENG Y K, LIU Y Q, et al. SogouT-16: A new web corpus to embrace IR research［C］//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Tokyo, Japan: ACM, SIGIR, 2017: 1233-1236 ［21］RONG X. Word2vec parameter learning explained ［DB/OL］.(2014-11-11)［2018-03-15］. https://arxiv.org/abs/1411.2738. ［22］ZHANG H, LIU Y Q, MA S P. Sentiment analysis of microblog text based on joint sentiment-topic model［C］//International Conference on Cloud Computing and Intelligence Systems. Shenzhen, China: IEEE, 2014: 46-54. ［23］JIANG J, AWADALLAH A H, SHI X, et al. Understanding and predicting graded search satisfaction［C］//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. Shanghai, China: ACM, 2015: 57-66. ［24］JIANG J P, HE D Q, ALLAN J. Searching, browsing, and clicking in a search session: Changes in user behavior by task and over time［C］//Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Gold Coast, Queensland, Australia: ACM, 2014: 607-616.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References