上海交通大学学报(英文版) ›› 2016, Vol. 21 ›› Issue (3): 280-288.doi: 10.1007/s12204-016-1723-2

• • 上一篇    下一篇

Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program

ZHAO Xia* (赵 夏), MA Sheng (马 胜), CHEN Wei (陈 微), WANG Zhiying (王志英)   

  1. (State Key Laboratory of High Performance Computing; College of Computer, National University of Defense Technology, Changsha 410072, China)
  • 出版日期:2016-06-30 发布日期:2016-06-30
  • 通讯作者: ZHAO Xia (赵 夏) E-mail: xiazhao@nudt.edu.cn

Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program

ZHAO Xia* (赵 夏), MA Sheng (马 胜), CHEN Wei (陈 微), WANG Zhiying (王志英)   

  1. (State Key Laboratory of High Performance Computing; College of Computer, National University of Defense Technology, Changsha 410072, China)
  • Online:2016-06-30 Published:2016-06-30
  • Contact: ZHAO Xia (赵 夏) E-mail: xiazhao@nudt.edu.cn

摘要: The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit (GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements.

关键词: general purpose graphics processing unit (GPGPU), multicore, intra-kernel, inter-kernel, parallel

Abstract: The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit (GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements.

Key words: general purpose graphics processing unit (GPGPU), multicore, intra-kernel, inter-kernel, parallel

中图分类号: