J Shanghai Jiaotong Univ Sci ›› 2021, Vol. 26 ›› Issue (2): 245-256.doi: 10.1007/s12204-020-2240-x
• Welding Automation & Computer Technology • Previous Articles
LI Bingchao (李炳超), WEI Jizeng (魏继增), GUO Wei (郭炜), SUN Jizhou (孙济州)
Online:
2021-04-28
Published:
2021-03-24
Contact:
WEI Jizeng (魏继增)
E-mail:weijizeng@tju.edu.cn
CLC Number:
LI Bingchao (李炳超), WEI Jizeng (魏继增), GUO Wei (郭炜), SUN Jizhou (孙济州). Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units[J]. J Shanghai Jiaotong Univ Sci, 2021, 26(2): 245-256.
[1] | LINDHOLM E, NICKOLLS J, OBERMAN S, et al.NVIDIA tesla: A unified graphics and computing architecture [J]. IEEE Micro, 2008, 28(2): 39-55. |
[2] | DAI H, LIN Z, LI C, et al. Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls[C]//Proceedings of the 24th International Symposium on High Performance Computer Architecture (HPCA).Piscataway, NJ, USA: IEEE, 2018: 208-220. |
[3] | KIM K, RO W W. WIR: Warp instruction reuse to minimize repeated computations in GPUs [C]//IEEE International Symposium on High Performance Computer Architecture (HPCA). Piscataway, NJ, USA:IEEE, 2018: 389-402. |
[4] | ABBASITABAR H, SAMAVATIAN M H, SARBAZIAZAD H. ASHA: An adaptive shared-memory sharing architecture for multi-programmed GPUs [J]. Microprocessors and Microsystems, 2016, 46: 264-273. |
[5] | OH B, KIM N S, AHN J, et al. A load balancing technique for memory channels [C]//International Symposium on Memory Systems. New York, USA: ACM,2018: 55-66. |
[6] | WANG B, YU W K, SUN X H, et al. DaCache:Memory divergence-aware GPU cache management[C]//29th ACM International Conference on Supercomputing (ICS). New York, USA: ACM, 2015: 89-98. |
[7] | TANASIC I, GELADO I, JORDA M, et al. Efficient exception handling support for GPUs[C]//Proceedings of the 50th International Symposium on Microarchitecture(MICRO). New York, USA: ACM, 2017: 109-122. |
[8] | DIAMOS G, ASHBAUGH B, MAIYURAN S,et al. SIMD re-convergence at thread frontiers[C]//Proceedings of the 44th International Symposium on Microarchitecture (MICRO). New York, USA:ACM, 2011: 477-488. |
[9] | FUNG W W L, SHAM I, YUAN G, et al. Dynamic warp formation and scheduling for efficient GPU control flow [C]//Proceedings of the 40th International Symposium on Microarchitecture (MICRO). Piscataway,NJ, USA: IEEE, 2007: 407-420. |
[10] | JIN X X, DAKU B, KO S B. Improved GPU SIMD control flow efficiency via hybrid warp size mechanism[J]. Microprocessors and Microsystems, 2014, 38(7):717-729. |
[11] | RHU M, EREZ M. The dual-path execution model for efficient GPU control flow [C]//Proceedings of the 19th International Symposium on High Performance Computer Architecture (HPCA). Piscataway, NJ, USA:IEEE, 2013: 591-602. |
[12] | ZHANG T, JING N, JIANG K, et al. Buddy SM: Sharing pipeline front-end for improved energy efficiency in GPGPUs [J]. ACM Transactions on Architecture and Code Optimization, 2015, 12(2): 16. |
[13] | KHORASANI F, GUPTA R, BHUYAN L N. Efficient warp execution in presence of divergence with collaborative context collection [C]//Proceedings of the 48th International Symposium on Microarchitecture (MICRO).Piscataway, NJ, USA: IEEE, 2015: 204-215. |
[14] | ELTANTAWY A, AAMODT T M. MIMD synchronization on SIMT architectures [C]//Proceedings of the 49th International Symposium on Microarchitecture (MICRO). Piscataway, NJ, USA: IEEE, 2016: 11. |
[15] | WANG Y, WANG D, CHEN S, et al. Iteration interleaving-based SIMD lane partition [J]. ACM Transactions on Architecture and Code Optimization,2016, 12(4): 58. |
[16] | FUNG W W L, AAMODT T M. Thread block compaction for efficient SIMT control flow[C]//Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA). Piscataway, NJ, USA: IEEE, 2011: 25-36. |
[17] | LIU Y, YU Z, EECKHOUT L, et al. Barrieraware warp scheduling for throughput processors[C]//Proceedings of the International Conference on Supercomputing (ICS). New York, USA: ACM, 2016:42. |
[18] | ELTANTAWY A, AAMODT T M. Warp scheduling for fine-grained synchronization [C]//Proceedings of the 24th International Symposium on High Performance Computer Architecture (HPCA). Piscataway,NJ, USA: IEEE, 2018: 375-388. |
[19] | GRAUER-GRAY S, XU L, SEARLES R, et al. Autotuning a high-level language targeted to GPU codes[C]//Innovative Parallel Computing (InPar). Piscataway,NJ, USA: IEEE, 2012: 1-10. |
[20] | HE B, FANG W, LUO Q, et al. Mars: A MapReduce framework on graphics processors [C]//International Conference on Parallel Architectures and Compilation Techniques (PACT). New York, USA: ACM, 2008:260-269. |
[21] | BURTSCHER M, NASRE R, PINGALI K. A quantitative study of irregular programs on GPUs[C]//Proceedings of the International Symposium on Workload Characterization (IISWC). Piscataway, NJ,USA: IEEE, 2012: 141-151. |
[22] | CHE S, BOYER M, MENG J, et al. Rodinia:A benchmark suite for heterogeneous computing [C]//Proceedings of the International Symposium on Workload Characterization (IISWC). Piscataway, NJ,USA: IEEE, 2009: 44-54. |
[23] | BAKHODA A, YUAN G L, FUNGWWL, et al. Analyzing CUDA workloads using a detailed GPU simulator[C]//International Symposium on Performance Analysis of Systems and Software (ISPASS). Piscataway,NJ, USA: IEEE, 2009: 163-174. |
[1] | GUO Zhenkai, GUO Jiaojiao, JIANG Haibo. Research on Collaborative Design and Consistency Analysis Method of Requirement and Architecture Model in Digital Thread [J]. Air & Space Defense, 2025, 8(2): 125-135. |
[2] | MA Xiaolong, XU Xinpeng, REN Shulei, LI Chen, CUI Shan. Architecture Design of Guidance Head Signal Processing Module Based on GP-GPU Technology Application [J]. Air & Space Defense, 2025, 8(2): 84-92. |
[3] | CHEN Weichi(陈韦池), LIU Haocheng(刘浩城), LI Zijian(李子建), GUO Jing, (郭靖), ZHAI Zhenkun(翟振坤), MENG Wei(孟伟). Novel Concentric Tube Robot Based on Double-Threaded Helical Gear Tube [J]. J Shanghai Jiaotong Univ Sci, 2023, 28(3): 296-306. |
[4] | ZHANG Ke, WU Yadong. Blade Optimization Design for Expanding Stable Operating Range of High Bypass Ratio Fan [J]. Journal of Shanghai Jiaotong University, 2020, 54(10): 1024-1034. |
[5] | Gao Jing, Chen Ze-wei. Optimization of Limit Protection System for Crane on Offshore Platform [J]. Ocean Engineering Equipment and Technology, 2019, 6(2): 494-498. |
[6] | WANG Yikai,YE Zuliang,PAN Zudong,ZHAO Jianfeng,HU Bin,CAO Feng. Hot-Gas Bypass Defrosting Method and Analysis of Defrosting Time for Transcritical CO2 Heat Pump [J]. Journal of Shanghai Jiaotong University, 2019, 53(11): 1367-1374. |
[7] |
LU Jiahua,QIANG Xiaoqing,TENG Jinfang,YU Wensheng.
Airworthiness Compliance Research of Aerodynamic Performance on Bird Strike Damaged Fan Blades [J]. Journal of Shanghai Jiaotong University, 2017, 51(8): 932-938. |
[8] | Hai-yuan YAO, Qing-ping LI, Bing CHENG, Shao-kai CHEN, Yong-fei LIU. Analysis of Influence Factors for Pigging Effectiveness of Bypass Pig in Natural Gas Liquid Pipelines [J]. Ocean Engineering Equipment and Technology, 2015, 2(1): 32-35. |
[9] | MA Guo-hong (马国红), DU Bao-zhou (杜保舟), XIE Fen (谢芬). Multiple Welding Seam Video Transmission and Monitoring of Welding Robot in Local Area Network [J]. Journal of shanghai Jiaotong University (Science), 2012, 17(4): 452-455. |
[10] |
YIN Hang,YAN Yonghua . Design of a Humanoid Robot Simulation Platform Based on MRDS [J]. Journal of Shanghai Jiaotong University, 2010, 44(11): 1529-1533. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||||||
Full text 53
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Abstract 517
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||