上海交通大学学报 ›› 2024, Vol. 58 ›› Issue (4): 579-584.doi: 10.16183/j.cnki.jsjtu.2024.035

• 电子信息与电气工程 • 上一篇    

一种纠删码条带化数据的一致性检查方法

许亮业1, 石连星2(), 单蓉胜3   

  1. 1.上海交通大学医学院附属上海儿童医学中心,上海 200127
    2.上海霄云信息科技有限公司,上海 200240
    3.上海交通大学 电子信息与电气工程学院,上海200240
  • 收稿日期:2024-01-24 修回日期:2024-02-20 接受日期:2024-02-22 出版日期:2024-04-28 发布日期:2024-04-30
  • 通讯作者: 石连星,硕士;E-mail:shilianxing@shxiaoyun.com.cn.
  • 作者简介:许亮业(1984-),正高级工程师,从事医疗信息化研究.

A Consistency Checking Method for Erasure-Coded Striped Data

XU Liangye1, SHI Lianxing2(), SHAN Rongsheng3   

  1. 1. Shanghai Children’s Medical Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200127, China
    2. Shanghai Xiaoyun Info Tech Co., Ltd., Shanghai 200240, China
    3. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Received:2024-01-24 Revised:2024-02-20 Accepted:2024-02-22 Online:2024-04-28 Published:2024-04-30

摘要:

纠删码冗余策略常用于分布式存储系统.在纠删码数据中,条带是一致性检查的基本单元,每个条带包含多份原始数据单元和校验数据单元.为了减少纠删码条带化数据一致性检查的读取开销,提高纠删码数据一致性检查和读后写的效率,在执行纠删码条带化数据写入时,为每个条带单元加入自修正数据标签 (SCDT),后续对每个条带的一致性检查基于SCDT完成.该方法不需要读取每个条带中所有数据单元即可完成该条带的一致性检查,将一致性检查效率提升了1.7~2.6倍,并且当写入数据更新的条带单元数小于临界值时,可以有效减少写入的输入输出(IO)交互次数.本文方法可以更好地应对条带化数据组的部分更新,同时提高一致性检查效率.

关键词: 分布式存储系统, 条带化数据, 一致性检查, 数据标签, 纠删码

Abstract:

Erasure code is commonly used in distributed storage systems. The stripe is the basic unit of consistency check in erasure-coded data, including multiple original stripe units and verification stripe units. In order to reduce the cost of reading for consistency check of erasure-coded striped data and improve the efficiency of erasure-coded data consistency check and reading-after-writing, self-correction data tags (SCDTs) is added to each stripe unit when writing erasure-coded data in striping mode, based on which, the consistency checks of each stripe are implemented. The method proposed can complete the consistency check of a stripe without reading all data units in the stripe, which improves the efficiency of consistency checks by 1.7 to 2.6 times. Moreover, when the number of stripe units updated by written data is less than the critical value, it can effectively reduce the number of Input/Output (IO) interactions for writing. The method proposed can better handle partial updates of striped data sets while improving the efficiency of consistency checks.

Key words: distributed storage system, striped data, consistency check, data tag, erasure code

中图分类号: