上海交通大学学报 ›› 2019, Vol. 53 ›› Issue (8): 978-982.doi: 10.16183/j.cnki.jsjtu.2019.08.013

• 学报(中文) • 上一篇    下一篇

基于窗口函数和分布式集群的可视化学术搜索系统数据查询优化

罗希意,霍晓阳,傅洛伊   

  1. 上海交通大学 电子信息与电气工程学院, 上海 200240
  • 出版日期:2019-08-28 发布日期:2019-09-10
  • 通讯作者: 傅洛伊,女,特别副研究员,E-mail:fu-ly@cs.sjtu.edu.cn.
  • 作者简介:罗希意(1987-),男,重庆市人,硕士生,主要从事数据库和大数据处理研究.
  • 基金资助:
    科技部重点研发计划(2018YFB1004703),上海市科学技术委员会创新行动计划(17511105103,18510761200)

Query Optimization of Data Based on Window Function and Distributed Cluster in Visual Academic Search System

LUO Xiyi,HUO Xiaoyang,FU Luoyi   

  1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Online:2019-08-28 Published:2019-09-10

摘要: 针对在密集分析型查询请求和海量数据的应用场景下传统关系型数据库MySQL性能不佳问题,提出了基于窗口函数(Window Function)的分析型查询优化方法,以分区(Partitioning)方法代替传统的分组(Group by)操作,并提出了基于分布式集群(SQL-on-Hadoop: SparkSQL)计算引擎的海量数据查询优化方法,采用内存列存储优化技术和Spark分布式集群计算以提高查询性能.同时,以典型的分析型SQL查询实例验证了其有效性.结果表明,所提出的查询优化方法能够显著提高查询性能.与传统的关系型数据库MySQL相比,基于SparkSQL的查询优化方法的查询速度大幅提高,从而验证了其用于可视化学术搜索系统AceMap数据查询的正确性.

关键词: 结构化查询语言; 窗口函数; 分布式计算; 查询优化

Abstract: In order to address the issue of the poor performance of traditional MySQL database in application scenarios of densely analytical query requests and massive data processing, we proposed an approach based on window functions for analytical SQL query optimization. The approach replaces the fundamental grouping operation by the partitioning operation. In addition, we also designed distributed clusters based method for massive data query optimization, the method utilizes the in-memory columnar storage technology and Spark cluster’s distributed computation to lift the query performance. Meanwhile, the validity of the proposed approaches has been verified by typical analytical SQL queries. Experimental results show that the proposed methods have improved the query performance significantly, as the query optimization based on SparkSQL has reduced the execution time by a wide margin compared to traditional relational database MySQL. These proved the effectiveness when the methods are applied in AceMap, a visual academic search system.

Key words: structured query language (SQL); window functions; distributed computation; query optimization

中图分类号: