[1]李建中, 王宏志, 高宏. 大数据可用性的研究进展[J]. 软件学报, 2016, 27(7): 1605-1625.
LI Jianzhong, WANG Hongzhi, GAO Hong. State-of-the-art of research on big data usability[J]. Journal of Software, 2016, 27(7): 1605-1625.
[2]曲朝阳, 孙立擎, 许助庆, 等. 基于B+树的电力大数据分布式索引[J]. 东北电力大学学报, 2016, 36(5): 80-85.
QU Zhaoyang, SUN Liqing, XUN Shaoqing, et al. Power big data distributed index based on B+ tree and inverted index[J]. Journal of Northeast Dianli University, 2016, 36(5): 80-85.
[3]LI A, JIWU S, MINGQINNG L. Data deduplication techniques[J]. Journal of Software, 2010, 21(5): 916-929.
[4]陈明. 桥梁预警系统的数据预处理[J]. 上海交通大学学报, 2012, 46(10):1680-1685.
CHEN Ming. Data preprocess for bridge damage alarming system[J]. Journal of Shanghai Jiao Tong University, 2012, 46(10): 1680-1685.
[5]崔霞, 施光林, 沈伟. 基于分组数据处理神经网络气动人工肌肉迟滞特性[J]. 上海交通大学学报, 2012, 46(6): 931-935.
CUI Xia, SHI Ganglin, SHEN Wei. Study on hysteresis of pneumatic artificial muscle based on group method of data handling neural network[J]. Journal of Shanghai Jiao Tong University, 2012, 46(6): 931-935.
[6]李建中, 刘显敏. 大数据的一个重要方面: 数据可用性[J]. 计算机研究与发展, 2013, 50(6): 1147-1162.
LI Jianzhong, LIU Xianmin. An important aspect of big data: Data usability[J]. Journal of Computer Research and Development, 2013, 50(6): 1147-1162.
[7]HERNANDEZ M A, STOLFO S J. Real-world data is dirty: Data cleansing and the merge/purge problem[J]. Data Mining & Knowledge Discovery, 1998, 2(1): 9-37.
[8]CHAUDHURI S, GANTI V, KAUSHIK S, et al. Leveraging constraints for deduplication: US 8204866 [P]. 2012-06-19 [2016-09-01].
[9]FAN W F, LI J Z, LUO J Z, et al. Incremental graph pattern matching[C]∥Proceedings of ACM SIGMOD. New York: ACM, 2011: 925-936.
[10]FAN W F, LI J Z, WANG X, et al. Query preserving graph compression[C]∥Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012: 157-168.
[11]CHAUDUTI S, GANTI V, MOTWANI R. Robust identification of fuzzy duplicates[C]∥21st International Conference on ICDE 2005. Tokoyo, Japan: IEEE, 2005: 865-876.
[12]GUHA S, KOUDAS N, MARATHE A, et al. Merging the results of approximate match operations[J]. Journal of Very Large Data Bases, 2004, 30(8): 636-647.
[13]CHEN Z, KALASHNIKOV D V, MEHROTRA S. Adaptive graphical approach to entity resolution[C]∥Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, 2007: 204-213.
[14]AUGSTEN N, BHLEN M H, DYRESON C, et al. Approximate joins for data-centric XML[C]∥24th International Conference on ICDE 2008. Cancún, Mexico: IEEE, 2008: 814-823.
[15]FERREIRA C L W, BUCHMANN E, HM K. Finding misplaced items in retail by clustering RFID data[C]∥Proceedings of the 13th Int Conf on Extending Database Technology. New York: ACM, 2010: 501-512.
[16]MANKU G S, JAIN A, DAS S A. Detecting near-duplicates for web crawling[C]∥Proceedings of the 16th international conference on World Wide Web. New York: ACM, 2007: 141-150.
[17]BUYRUKBILEN S, BAKIRAS S. Secure similar document detection with SimHash[M]. Berlin: Springer, 2014: 61-75.
[18]COHEN W W. Data integration using similarity joins and a word-based information representation language[J]. Acm Transactions on Information Systems, 2010,18(3): 288-321.
[19]曲朝阳, 陈帅, 杨帆, 等. 基于云计算技术的电力大数据预处理属性约简方法[J]. 电力系统自动化, 2014, 38(8):67-71.
QU Zhaoyang, CHEN Shuai, YANG Fan, et al. An attribute reducing method for electric power big data preprocessing based on cloud computing technology[J]. Automation of Electric Power Systems, 2014, 38(8): 67-71.
[20]池子文, 张丰, 杜震洪, 等. 云环境下基于预分片的遥感数据并行重采样方法[J]. 上海交通大学学报, 2014, 48(11): 1627-1632.
CHI Ziwen, ZHANG Feng, DU Zhenhong, et al. Parallel resampling method of remote sensing data based on pre-partitioning for cloud computing[J]. Journal of Shanghai Jiao Tong University, 2014, 48(11): 1627-1632.
[21]曲朝阳, 朱莉, 张士林. 基于Hadoop 的广域测量系统数据处理[J]. 电力系统自动化, 2013, 37(4): 92-97.
QU Zhaoyang, ZHU Li, ZHANG Shilin. Data processing of hadoop-based wide area measurement system[J]. Automation of Electric Power Systems, 2013, 37(4): 92-97.
[22]KPCKE H, THOR A, RAHM E. Evaluation of entity resolution approaches on real-world match problems[C]∥Proceedings of the VLDB Endowment. Berlin: Springer, 2010, 3(1): 484-493.
[23]ARASU A, RE C, SUCIU D. Large-scale deduplication with constraints using dedupalog[C]∥IEEE 25th International Conference on Data Engineering. New York: IEEE, 2009: 952-963. |