Query Optimization of Data Based on Window Function and Distributed Cluster in Visual Academic Search System

Expand
  • School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Online published: 2019-09-10

Abstract

In order to address the issue of the poor performance of traditional MySQL database in application scenarios of densely analytical query requests and massive data processing, we proposed an approach based on window functions for analytical SQL query optimization. The approach replaces the fundamental grouping operation by the partitioning operation. In addition, we also designed distributed clusters based method for massive data query optimization, the method utilizes the in-memory columnar storage technology and Spark cluster’s distributed computation to lift the query performance. Meanwhile, the validity of the proposed approaches has been verified by typical analytical SQL queries. Experimental results show that the proposed methods have improved the query performance significantly, as the query optimization based on SparkSQL has reduced the execution time by a wide margin compared to traditional relational database MySQL. These proved the effectiveness when the methods are applied in AceMap, a visual academic search system.

Cite this article

LUO Xiyi,HUO Xiaoyang,FU Luoyi . Query Optimization of Data Based on Window Function and Distributed Cluster in Visual Academic Search System[J]. Journal of Shanghai Jiaotong University, 2019 , 53(8) : 978 -982 . DOI: 10.16183/j.cnki.jsjtu.2019.08.013

References

[1]WU Z H, WU J, KHABSA M, et al. Towards building a scholarly big data platform: Challenges, lessons and opportunities[C]//Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. Piscataway, USA: IEEE Press, 2014: 117-126. [2]SINHA A, SHEN Z H, SONG Y, et al. An overview of Microsoft academic service (MAS) and applications[C]//Proceedings of the 24th International Conference on World Wide Web. New York, USA: ACM Press, 2015: 243-246. [3]SCHWARTZ B, ZAITSEV P, TKACHENKO V. High performance MySQL [M]. 3rd edition. Sebastopol, USA: O’Reilly Media, 2012: 1-31. [4]LEIS V, KUNDHIKANJANA K, KEMPER A, et al. Efficient processing of window functions in analytical SQL queries[C]//Proceedings of the VLDB Endowment. Hawaii, USA: Morgan Press, 2015, 8(10): 1058-1069. [5]ARMBRUST M, XIN R, LIAN C, et al. Spark SQL: Relational data processing in spark[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2015: 1383-1394. [6]GUO C H, WU Z G, HE Z Y, et al. An adaptive data partitioning scheme for accelerating exploratory spark SQL queries[C]//International Conference on Database Systems for Advanced Applications. Suzhou, China: DASFAA Press, 2017: 114-128. [7]CUZZOCREA A, BELLATRECHE L, SONG I. Data warehousing and OLAP over big data: Current challenges and future research directions[C]//Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP. New York, USA: ACM Press, 2013: 67-70. [8]EISENBERG A, MELTON J, KULKARNI K, et al. SQL: 2003 has been published[EB/OL].[2017-10-10]. https://dl.acm.org/citation.cfm?id=974142. [9]RIGGS S, CIOLLI G, KROSING H, et al. PostgreSQL 9 Administration cookbook [M]. 2nd edition. Birmingham, UK: Packt Press, 2015: 1-8. [10]AGARWAL S, ARMBRUST M, BRADLEY J, et al. Spark documentations-spark overview[EB/OL]. [2017-10-10]. http://spark.apache.org/committers.html. [11]WHITE T. Hadoop: The definitive guide[M]. 4th edition. Sebastopol, USA: O’Reilly Media, 2015: 3-17. [12]ABADI D, BABU S, ZCAN F, et al. Tutorial: SQL-on-hadoop systems[C]//Proceedings of the VLDB Endowment. Hawaii, USA: VLBD Press, 2015, 8(12): 2050-2051. [13]PLATTNER H, ZEIER A, TINNEFELD C, et al. Available-to-promise on an in-memory column store: EP 2575093 [P]. 2011-10-10[2013-04-03].
Outlines

/