AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System

doi:10.1007/s12204-015-1591-1

Journal of shanghai Jiaotong University (Science) ›› 2015, Vol. 20 ›› Issue (1): 76-81.doi: 10.1007/s12204-015-1591-1

Previous Articles Next Articles

AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System

XING Yu-xuan¹* (邢玉轩), XIAO Nong¹ (肖侬), LIU Fang¹ (刘芳), SUN Zhen¹ (孙振), HE Wan-hui² (何晚辉)

(1. State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China; 2. Command Department, Nanjing Artillery Academy, Nanjing 210000, China)
(1. State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China; 2. Command Department, Nanjing Artillery Academy, Nanjing 210000, China)

Online:2015-02-28 Published:2015-03-10
Contact: XING Yu-xuan (邢玉轩) E-mail: xinghuan1990@gmail.com

Abstract

Abstract: As data are growing rapidly in data centers, inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability. However, there are some challenges faced by the cluster deduplication system: the decreasing data deduplication rate with the increasing deduplication server nodes, high communication overhead for data routing, and load balance to improve the throughput of the system. In this paper, we propose a well-performed cluster deduplication system called AR-Dedupe. The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm. In addition, we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30% performance improvement.

Key words: cluster deduplication system| routing algorithm| application-aware

摘要： As data are growing rapidly in data centers, inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability. However, there are some challenges faced by the cluster deduplication system: the decreasing data deduplication rate with the increasing deduplication server nodes, high communication overhead for data routing, and load balance to improve the throughput of the system. In this paper, we propose a well-performed cluster deduplication system called AR-Dedupe. The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm. In addition, we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30% performance improvement.

关键词: cluster deduplication system| routing algorithm| application-aware

CLC Number:

TP 393.01

XING Yu-xuan1* (邢玉轩), XIAO Nong1 (肖侬), LIU Fang1 (刘芳), SUN Zhen1 (孙振), HE Wan-hui2 (何晚辉). AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System[J]. Journal of shanghai Jiaotong University (Science), 2015, 20(1): 76-81.

References 8

[1]	Villars R L, Olofson C W, Eastwood M. Big data: What it is and why you should care [R]. Framingham,MA, USA: IDC, 2011.
[2]	Kolodg C J. Effective data leak prevention programs:Start by protecting data at he source—your database[R]. Framingham, MA, USA: IDC, 2011.
[3]	Bhagwat D, Eshghi K, Long D D E, et al. Extreme Binning: Scalable, parallel deduplication for chunk based file backup [C]//Proceedings of the 17th IEEE/ACM International Symposium on Modeling,Analysis and Simulation of Computer and Telecommunication Systems (MAS-COTS’2009). London, UK:IEEE, 2009: 1-9.
[4]	Fu Y J, Jiang H, Xiao N. A scalable inline cluster deduplication framework for big data protection [C]//The ACM/IFIP/USENIX 13th International Conference on Middleware (Middleware’12).[s.l.]: ACM, 2012: 354-373.
[5]	Fu Y J, Jiang H, Xiao N, et al. AA-dedupe: An application-aware source deduplication approach for cloud backup services in the personal computing environment[C]// Proceedings of the 13th IEEE Internatioanl Conference on Cluster Computing (Cluster’11).[s.l.]: IEEE, 2011: 112-120.
[6]	El-Shimi A, Kalach R, Kumar A, et al. Primary data deduplication—large scale study and system design[C]//Proceedings of the 2012 USENIX Annual Technical Conference. [s.l.]: ATC, 2012: 285-296.
[7]	Bhagwat D, Eshghi K, Mehra P. Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus[C]//Proceedings of the 13th ACM International Conference on Knowledge Discovery and Data Mining(SIGKDD’07). San Jose, California, USA: ACM, 2007:105-112.
[8]	Meyer D T, Bolosky W J. A study of practical deduplication [J]. ACM Transaction on Storage, 2012,7(4): 14.

AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System

AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References 8

Related Articles 2

Recommended Articles

Metrics

Comments

[1]	ZHOU Lei-1, 2 , WU Ning-1, LI Yun-2. A Fault-Tolerant and Deadlock-Free Routing Algorithm in2D-Mesh for Network on Chip [J]. Journal of Shanghai Jiaotong University, 2013, 47(01): 18-22.
[2]	ZHANG Rui-Hua-1, JIA Zhi-Ping-1, CHENG He-You-2. The Routing Algorithm for WSNs Based on Unequal Clustering and Minimum Energy Consumption [J]. Journal of Shanghai Jiaotong University, 2012, 46(11): 1774-1778.