上海交通大学学报 ›› 2020, Vol. 54 ›› Issue (2): 117-125.doi: 10.16183/j.cnki.jsjtu.2020.02.002

• 学报(中文) • 上一篇    下一篇

基于用户意图的搜索结果文本突显方法

张辉,马少平   

  1. 清华大学 计算机科学与技术系; 智能技术与系统国家重点实验室, 北京 100084
  • 发布日期:2020-03-06
  • 通讯作者: 马少平,男,教授,博士生导师,电话(Tel.):010-62783191;E-mail:msp@tsinghua.edu.cn.
  • 作者简介:张辉(1981-),女,山东省郓城县人,博士生,主要研究方向为网络用户行为分析、搜索引擎用户接口设计.
  • 基金资助:
    国家自然科学基金(61622208, 61732008, 61532011)资助项目

Intent-Aware Search Snippet Text Highlighting Method

ZHANG Hui,MA Shaoping   

  1. Department of Computer Science and Technology; State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China
  • Published:2020-03-06

摘要: 信息检索效率在很大程度上取决于用户看到的搜索引擎结果页面所提供的内容.目前,红色突显查询词是商业搜索引擎结果页面主要采用的文本展示方式,但由于查询词可能表达模糊或者包含噪音,与用户的查询意图往往不能完全一致.为了能够充分地反映用户的查询意图,同时突显对于满足用户查询意图最重要的词语,基于人工标注的结果提出一种新的关键词突显策略;综合结构化支持向量机、隐马尔科夫、最大间隔马尔科夫网络及条件随机场4种基础的序列标注机器学习模型,进一步提出一种新的联合序列学习模型并进行用户搜索实验.实验结果表明:该种模型优于4种基础模型,与人工标注的结果相比取得了9330%的准确率;所提出的关键词突显策略明显优于传统的查询词突显策略,提高了用户的满意度及搜索效益.

关键词: 搜索引擎结果页面; 用户意图; 查询词突显; 序列标注算法

Abstract: The efficiency of information retrieval from web depends largely on the search engine results page (SERP) that obtained by searchers, especially the highlighting text. At present, the SERP of commercial search engines usually uses query terms highlighting strategy. However, the query words can be ambiguous and even contain noise, which may be incompletely consistent with the search intention of users. In order to highlight the most important terms that describe the search information clearly, this paper proposes a new key term highlighting strategy based on the results of manual annotation. Then this paper generates highlighting terms based on four machine learning algorithms, including structured support vector machine, hidden Markov model, max-margin Markov networks and conditional random field algorithm. In addition, this paper also proposes a new method which called the joint sequence labeling (JSL) algorithm to combine these four structured learning algorithms. Moreover, this paper conducts search experiments by using JSL algorithm. Experimental results show that the JSL algorithm provides more accurate solutions compared with the baselines and its search accuracy achieves 9330%. And the results of search experiments show that the key term highlighting strategy achieves better performance and users’satisfactory than traditional query terms highlighting strategy.

Key words: search engine results page (SERP); intent of user; query terms highlighting; joint sequence labeling (JSL) algorithm

中图分类号: