上海交通大学学报(英文版) ›› 2015, Vol. 20 ›› Issue (1): 76-81.doi: 10.1007/s12204-015-1591-1

• • 上一篇    下一篇

AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System

XING Yu-xuan1* (邢玉轩), XIAO Nong1 (肖侬), LIU Fang1 (刘芳), SUN Zhen1 (孙振), HE Wan-hui2 (何晚辉)   

  1. (1. State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China; 2. Command Department, Nanjing Artillery Academy, Nanjing 210000, China)
  • 出版日期:2015-02-28 发布日期:2015-03-10
  • 通讯作者: XING Yu-xuan (邢玉轩) E-mail: xinghuan1990@gmail.com

AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System

XING Yu-xuan1* (邢玉轩), XIAO Nong1 (肖侬), LIU Fang1 (刘芳), SUN Zhen1 (孙振), HE Wan-hui2 (何晚辉)   

  1. (1. State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China; 2. Command Department, Nanjing Artillery Academy, Nanjing 210000, China)
  • Online:2015-02-28 Published:2015-03-10
  • Contact: XING Yu-xuan (邢玉轩) E-mail: xinghuan1990@gmail.com

摘要: As data are growing rapidly in data centers, inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability. However, there are some challenges faced by the cluster deduplication system: the decreasing data deduplication rate with the increasing deduplication server nodes, high communication overhead for data routing, and load balance to improve the throughput of the system. In this paper, we propose a well-performed cluster deduplication system called AR-Dedupe. The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm. In addition, we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30% performance improvement.

关键词: cluster deduplication system, routing algorithm, application-aware

Abstract: As data are growing rapidly in data centers, inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability. However, there are some challenges faced by the cluster deduplication system: the decreasing data deduplication rate with the increasing deduplication server nodes, high communication overhead for data routing, and load balance to improve the throughput of the system. In this paper, we propose a well-performed cluster deduplication system called AR-Dedupe. The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm. In addition, we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30% performance improvement.

Key words: cluster deduplication system, routing algorithm, application-aware

中图分类号: