This paper presents a design of software distributed shared memory. Inspired by the snooping-based cache coherence, it adopts a new kind of memory coherence protocol which is quite different from the previous ones. The protocol aims to improve the performance of the distributed shared memory by reducing the messages needed in the critical path of execution. This paper further gives the practical implementation and optimization. The preliminary evaluation implies a 45% rise at most in performance.
ZHENG Yang,CHEN Haibo,ZANG Binyu
. Snooping-Based Distributed Shared Memory[J]. Journal of Shanghai Jiaotong University, 2018
, 52(10)
: 1333
-1338
.
DOI: 10.16183/j.cnki.jsjtu.2018.10.023
[1]CARTER J B, BENNETT J K, ZWAENEPOEL W. Techniques for reducing consistency-related communication in distributed shared-memory systems[J]. ACM Transactions on Computer Systems (TOCS), 1995, 13(3): 205-243.
[2]BENNETT J K, CARTER J B, ZWAENEPOEL W. Munin: Distributed shared memory based on type-specific memory coherence[J]. ACM SIGPLAN Notices, 1990, 25(3): 168-176.
[3]BERSHAD B N, ZEKAUSKAS M J, SAWDON W A. The midway distributed shared memory system[C]//Compcon Spring ’93, Digest of Papers. San Francisco, CA, USA: IEEE, 1993: 528-537.
[4]AMZA C, COX A L, DWARKADAS S, et al. TreadMarks: Shared memory computing on networks of workstations[J]. Computer, 1996, 29(2): 18-28.
[5]RAVISHANICAR C V, GOODMAN J R. Cache implementation for multiple microprocessors[C]//Proceedings of IEEE COMPCON. [s.l.]: IEEE, 1983: 346-350.
[6]LI Kai. A shared virtual memory system for parallel computing[C]//International Conference on Parallel Processing. [s.l.]: ICPP, 1988: 94-101.
[7]SORIN D J, HILL M D, WOOD D A. A primer on memory consistency and cache coherence: Synthesis Lectures on Computer Architecture[M].Williston: Morgan & Claypool Publishers, 2011: 139-175.
[8]AGARWAL A, SIMONI R, HENNESSY J, et al. An evaluation of directory schemes for cache coherence[C]//ACM SIGARCH Computer Architecture News. Honolulu, HI, USA: IEEE, 1988: 3228444.
[9]KALIA A, KAMINSKY M, ANDERSEN D G. Design guidelines for high performance RDMA systems[C]//2016 USENIX Annual Technical Conference. Denver, CO, USA: ACM, 2016: 437-450.
[10]ERLICHSON A, NUCKOLLSN, CHESSON G, et al. SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory[C]//International Conference on Architectural Support for Programming Languages and Operating Systems. [s.l.]: ACM, 1996: 210-220.
[11]BARAK D. Verbs programming tutorial [EB/OL].(2014-01-01) [2018-04-30]. https://www.pdfdrive.net/barak-verbs-programming-tutorial-e10217696.html.
[12]AMIT N. Optimizing the TLB shootdown algorithm with page access tracking[C]//USENIX Annual Technical Conference (ATC). Santa Clara: ACM, 2017: 27-39.
[13]ESSEN B V, HSIEH H, AMES S, et al. DIMMAP: A high performance memory-map runtime for data-intensive applications[C]//IEEE International Workshop on Data-Intensive Scalable Computing Systems (SCC). Salt Lake City, UT, USA: IEEE, 2012: 731-735.
[14]RANGER C, RAGHURAMAN R, PENMETSA A, et al. Evaluating MapReduce for multi-core and multiprocessor systems[C]//High Performance Computer Architecture (HPCA). Scottsdale, AZ, USA: IEEE, 2007: 13-24.