电子信息与电气工程

一种纠删码条带化数据的一致性检查方法

展开
  • 1.上海交通大学医学院附属上海儿童医学中心,上海 200127
    2.上海霄云信息科技有限公司,上海 200240
    3.上海交通大学 电子信息与电气工程学院,上海200240
许亮业(1984-),正高级工程师,从事医疗信息化研究.

收稿日期: 2024-01-24

  修回日期: 2024-02-20

  录用日期: 2024-02-22

  网络出版日期: 2024-04-30

A Consistency Checking Method for Erasure-Coded Striped Data

Expand
  • 1. Shanghai Children’s Medical Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200127, China
    2. Shanghai Xiaoyun Info Tech Co., Ltd., Shanghai 200240, China
    3. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Received date: 2024-01-24

  Revised date: 2024-02-20

  Accepted date: 2024-02-22

  Online published: 2024-04-30

摘要

纠删码冗余策略常用于分布式存储系统.在纠删码数据中,条带是一致性检查的基本单元,每个条带包含多份原始数据单元和校验数据单元.为了减少纠删码条带化数据一致性检查的读取开销,提高纠删码数据一致性检查和读后写的效率,在执行纠删码条带化数据写入时,为每个条带单元加入自修正数据标签 (SCDT),后续对每个条带的一致性检查基于SCDT完成.该方法不需要读取每个条带中所有数据单元即可完成该条带的一致性检查,将一致性检查效率提升了1.7~2.6倍,并且当写入数据更新的条带单元数小于临界值时,可以有效减少写入的输入输出(IO)交互次数.本文方法可以更好地应对条带化数据组的部分更新,同时提高一致性检查效率.

本文引用格式

许亮业, 石连星, 单蓉胜 . 一种纠删码条带化数据的一致性检查方法[J]. 上海交通大学学报, 2024 , 58(4) : 579 -584 . DOI: 10.16183/j.cnki.jsjtu.2024.035

Abstract

Erasure code is commonly used in distributed storage systems. The stripe is the basic unit of consistency check in erasure-coded data, including multiple original stripe units and verification stripe units. In order to reduce the cost of reading for consistency check of erasure-coded striped data and improve the efficiency of erasure-coded data consistency check and reading-after-writing, self-correction data tags (SCDTs) is added to each stripe unit when writing erasure-coded data in striping mode, based on which, the consistency checks of each stripe are implemented. The method proposed can complete the consistency check of a stripe without reading all data units in the stripe, which improves the efficiency of consistency checks by 1.7 to 2.6 times. Moreover, when the number of stripe units updated by written data is less than the critical value, it can effectively reduce the number of Input/Output (IO) interactions for writing. The method proposed can better handle partial updates of striped data sets while improving the efficiency of consistency checks.

参考文献

[1] 艾瑞咨询. 2022年中国医疗信息化行业研究报告[EB/OL]. (2022-04-19) [2023-10-30]. https://www.thepaper.cn/newsDetail_forward_17686922 .
  iResearch Consulting Group. 2022 China Medical Informatization Industry Research Report[EB/OL]. (2022-04-19) [2023-10-30]. https://www.thepaper.cn/newsDetail_forward_17686922 .
[2] HURLEN P, ?STBYE T, BORTHNE A, et al. Introducing PACS to the late majority. A longitudinal study[J]. Journal of Digital Imaging, 2010, 23(1): 87-94.
[3] DESHMUKH V, SVB L, KULKARNI M, et al. PACS: An overview of the technology and related issues[J]. International Journal of Engineering Technology Science and Research, 2018, 5(5): 122-128.
[4] GHEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system[J]. ACM SIGOPS Operating Systems Review, 2003, 37(5): 29-43.
[5] SHVACHKO K, KUANG H R, RADIA S, et al. The hadoop distributed file system[C]// Proceedings of the 26th Symposium on Mass Storage Systems and Technologies. Washington, USA: IEEE, 2010: 1-10.
[6] WEIL S A, BRANDT S A, MILLER E L, et al. Ceph:A scalable, high-performance distributed file system[C]// Proceedings of the 7th Symposium on Operating Systems Design and Implementation. Seattle, USA: ACM, 2006: 307-320.
[7] ADYA A, BOLOSKY W J, CASTRO M, et al. Farsite: Federated, available, and reliable storage for an incompletely trusted environment[J]. ACM SIGOPS Operating Systems Review, 2002, 36(1): 1-14.
[8] REED I S, SOLOMON G. Polynomial codes over certain finite fields[J]. Journal of the Society for Industrial and Applied Mathematics, 1960, 8(2): 300-304.
[9] MURALIDHAR S, LLOYD W, ROY S, et al.f4: Facebook’s warm BLOB storage system[C]// Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. Broomfield, USA: ACM, 2014: 383-398.
[10] CALDER B, WANG J, OGUS A, et al. Windows Azure Storage: A highly available cloud storage service with strong consistency[C]// Proceedings of the 23rd ACM Symposium on Operating Systems Principles. Cascais Portugal: ACM, 2011: 143-157.
[11] HUANG C, SIMITCI H, XU Y K, et al. Erasure coding in windows azure storage[C]// Proceedings of the 2012 USENIX Conference on Annual Technical Conference. Boston, USA: ACM, 2012: 15-26.
[12] BERMUDEZ I, TRAVERSO S, MELLIA M, et al. Exploring the cloud from passive measurements: The Amazon AWS case[C]// Proceedings of 2013 IEEE INFOCOM. Turin, Italy: IEEE, 2013: 230-234.
[13] KUBIATOWICZ J, BINDEL D, CHEN Y, et al. OceanStore: An architecture for global-scale persistent storage[J]. ACM SIGPLAN Notices, 2000, 35(11): 190-201.
[14] 杨传辉. 大规模分布式存储系统: 原理解析与架构实战[M]. 北京: 机械工业出版社, 2013.
  YANG Chuanhui. Large-scale distributed storage system: Principles and architectures[M]. Beijing: China Machine Press, 2013.
[15] Swift Team. Erasure code support[EB/OL]. (2019-08-14) [2024-02-23]. https://docs.openstack.org/swift/latest/overview_erasure_code.html .
[16] BREWER E A. Towards robust distributed systems (abstract)[C]// Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing. New York, USA: ACM, 2000: 7.
[17] 田俊峰, 王彦骉, 何欣枫, 等. 数据因果一致性研究综述[J]. 通信学报, 2020, 41(3): 154-167.
  TIAN Junfeng, WANG Yanbiao, HE Xinfeng, et al. Survey on the causal consistency of data[J]. Journal on Communications, 2020, 41(3): 154-167.
[18] LAMPORT L. The part-time parliament[J]. ACM Transactions on Computer Systems, 1998, 116(2): 133-169.
[19] LAMPORT L. Paxos made simple[J]. ACM SIGACT News, 2001, 32(4): 51-58.
[20] ONGARO D, OUSTERHOUT J. In search of an understandable consensus algorithm[C]// Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. Philadelphia, USA: ACM, 2014: 305-320.
[21] 刘爱贵, 李纲彬, 阮薛平. 一种纠删码数据一致性保障方法及系统: CN 114064346 A[P]. 2022-02-18 [2023-10-28].
  LIU Aigui, LI Gangbin, RUAN Xueping. A method and system for ensuring consistency of erasure coded data: CN 114064346 A[P]. 2022-02-18 [2023-10-28].
文章导航

/