提出一种基于软件实现的分布式共享内存.借鉴基于广播的缓存一致性,采用不同于以往内存一致性的协议设计,以通过减少维护内存一致性所需要等待的网络消息来提高分布式共享内存的整体性能.基于这种协议设计,进一步给出切实的实现方法和针对性的优化.测试表明,这种设计相较于以往的分布式内存协议,最高可以获得45%的性能提升.
This paper presents a design of software distributed shared memory. Inspired by the snooping-based cache coherence, it adopts a new kind of memory coherence protocol which is quite different from the previous ones. The protocol aims to improve the performance of the distributed shared memory by reducing the messages needed in the critical path of execution. This paper further gives the practical implementation and optimization. The preliminary evaluation implies a 45% rise at most in performance.
[1]CARTER J B, BENNETT J K, ZWAENEPOEL W. Techniques for reducing consistency-related communication in distributed shared-memory systems[J]. ACM Transactions on Computer Systems (TOCS), 1995, 13(3): 205-243.
[2]BENNETT J K, CARTER J B, ZWAENEPOEL W. Munin: Distributed shared memory based on type-specific memory coherence[J]. ACM SIGPLAN Notices, 1990, 25(3): 168-176.
[3]BERSHAD B N, ZEKAUSKAS M J, SAWDON W A. The midway distributed shared memory system[C]//Compcon Spring ’93, Digest of Papers. San Francisco, CA, USA: IEEE, 1993: 528-537.
[4]AMZA C, COX A L, DWARKADAS S, et al. TreadMarks: Shared memory computing on networks of workstations[J]. Computer, 1996, 29(2): 18-28.
[5]RAVISHANICAR C V, GOODMAN J R. Cache implementation for multiple microprocessors[C]//Proceedings of IEEE COMPCON. [s.l.]: IEEE, 1983: 346-350.
[6]LI Kai. A shared virtual memory system for parallel computing[C]//International Conference on Parallel Processing. [s.l.]: ICPP, 1988: 94-101.
[7]SORIN D J, HILL M D, WOOD D A. A primer on memory consistency and cache coherence: Synthesis Lectures on Computer Architecture[M].Williston: Morgan & Claypool Publishers, 2011: 139-175.
[8]AGARWAL A, SIMONI R, HENNESSY J, et al. An evaluation of directory schemes for cache coherence[C]//ACM SIGARCH Computer Architecture News. Honolulu, HI, USA: IEEE, 1988: 3228444.
[9]KALIA A, KAMINSKY M, ANDERSEN D G. Design guidelines for high performance RDMA systems[C]//2016 USENIX Annual Technical Conference. Denver, CO, USA: ACM, 2016: 437-450.
[10]ERLICHSON A, NUCKOLLSN, CHESSON G, et al. SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory[C]//International Conference on Architectural Support for Programming Languages and Operating Systems. [s.l.]: ACM, 1996: 210-220.
[11]BARAK D. Verbs programming tutorial [EB/OL].(2014-01-01) [2018-04-30]. https://www.pdfdrive.net/barak-verbs-programming-tutorial-e10217696.html.
[12]AMIT N. Optimizing the TLB shootdown algorithm with page access tracking[C]//USENIX Annual Technical Conference (ATC). Santa Clara: ACM, 2017: 27-39.
[13]ESSEN B V, HSIEH H, AMES S, et al. DIMMAP: A high performance memory-map runtime for data-intensive applications[C]//IEEE International Workshop on Data-Intensive Scalable Computing Systems (SCC). Salt Lake City, UT, USA: IEEE, 2012: 731-735.
[14]RANGER C, RAGHURAMAN R, PENMETSA A, et al. Evaluating MapReduce for multi-core and multiprocessor systems[C]//High Performance Computer Architecture (HPCA). Scottsdale, AZ, USA: IEEE, 2007: 13-24.