上海交通大学学报(自然版) ›› 2015, Vol. 49 ›› Issue (08): 1075-1083.
• 自动化技术、计算机技术 • 下一篇
谭明超a,刁兴春a,曹建军a,冯径b
收稿日期:
2014-10-27
出版日期:
2015-08-31
发布日期:
2015-08-31
基金资助:
国家自然科学基金项目 (61070714),解放军理工大学预研基金项目(20110604)资助
TAN Mingchaoa,DIAO Xingchuna,CAO Jianjuna,FENG Jingb
Received:
2014-10-27
Online:
2015-08-31
Published:
2015-08-31
摘要:
摘要: 属性相似度的准确性是影响实体分辨准确程度的重要因素之一. 为提高属性相似度的准确性, 分析了属性相似度与函数依赖的关系, 给出了属性相似度调整原则, 提出了依据函数依赖进行相似度划分、相似度传递调整和计算相似度调整代价的方法, 提出了通过属性相似度调整提高属性相似度准确性的属性相似度传递调整算法. 实验结果表明,该算法能够更好地区分匹配记录对和不匹配记录对, 获得更高的查全率、查准率和F1值.
中图分类号:
谭明超a,刁兴春a,曹建军a,冯径b. 一种基于函数依赖的属性相似度调整算法[J]. 上海交通大学学报(自然版), 2015, 49(08): 1075-1083.
TAN Mingchaoa,DIAO Xingchuna,CAO Jianjuna,FENG Jingb. An Attribute Similarity Adjusting Algorithm Based on Functional Dependency[J]. Journal of Shanghai Jiaotong University, 2015, 49(08): 1075-1083.
[1]Papadakis G, Ioannou E, Niederée C, et al. Eliminating the redundancy in blockingbased entity resolution methods[C]∥ Glen Newton. Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries. Ottawa:ACM,2011:8594.[2]Papadakis G, Ioannou E, Niederée C, et al. Efficient entity resolution for large heterogeneous information spaces[C]∥ Irwin King. Proceedings of the fourth ACM international conference on Web search and data mining. Hong Kong:ACM,2011:535544.[3]Papadakis G, Ioannou E, Niederée C, et al. To compare or not to compare: making entity resolution more efficient[C]∥ Roberto De Virgilio, Fausto Giunchiglia, Letizia Tanca. Proceedings of the International Workshop on Semantic Web Information Management. Athens:ACM,2011:3.[4]Lange D, Naumann F. Efficient similarity search: arbitrary similarity measures, arbitrary composition[C]∥ Bettina Berendt, Arjen de Vries, Wenfei Fan. Proceedings of the 20th ACM international conference on Information and knowledge management. Glasgow:ACM,2011:16791688.[5]Heath T, Bizer C. Linked data: Evolving the web into a global data space[J]. Synthesis lectures on the semantic web: theory and technology,2011,1(1):1136.[6]Paradies M, Malaika S, Siméon J, et al. Entity matching for semistructured data in the Cloud[C]∥ Sascha Ossowski, Rey Juan Carlos. Proceedings of the 27th Annual ACM Symposium on Applied Computing. Trento:ACM,2012:453458.[7]Snae C. A comparison and analysis of name matching algorithms[J]. International Journal of Applied Science, Engineering and Technology, 2007,4(1):252257.[8]Wang J, Li G, Yu J X, et al. Entity matching: how similar is similar[J]. Proceedings of the VLDB Endowment,2011,4(10):622633.[9]Christen P. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection[M]. New York:Springer Science & Business Media,2012.[10]Naumann F, Herschel M. An introduction to duplicate detection[J]. Synthesis Lectures on Data Management,2010,2(1):187.[11]Dorneles C F, Gonalves R, Ronaldo dos Santos Mello. Approximate data instance matching: a survey[J]. Knowledge and Information Systems,2011,27(1):121.[12]Abril D, NavarroArribas G, Torra V. Improving record linkage with supervised learning for disclosure risk assessment[J]. Information Fusion, 2012,13(4):274284.[13]Mannila H, Rih K J. Algorithms for inferring functional dependencies from relations[J]. Data & Knowledge Engineering,1994,12(1):8399. [14]Huhtala Y, Krkkinen J, Porkka P, et al. TANE: An efficient algorithm for discovering functional and approximate dependencies[J]. The Computer Journal,1999,42(2):100111.[15]张守志,施伯乐. 一种发现函数依赖集的方法及应用[J]. 软件学报,2003,14(10):16921696.ZHANG Shouzhi, SHI Bole. A method for discovering functional dependencies and its application[J]. Journal of Software,2003,14(10):16921696.[16]Huhtala Y, Karkkainen J, Porkka P, et al. Efficient discovery of functional and approximate dependencies using partitions[C]∥ Proceedings of 14th International Conference on Data Engineering. Orlando:IEEE,1998:392401.[17]Weis M, Naumann F. Detecting duplicate objects in XML documents[C]∥ Felix Naumann, Monica Scannapieco. Proceedings of the 2004 International Workshop on Information quality in information systems. Paris:ACM,2004:1019.[18]Benjelloun O, GarciaMolina H, Gong H, et al. Dswoosh: A family of algorithms for generic, distributed entity resolution[C]∥ Proceedings of 27th International Conference on Distributed Computing Systems. Toronto:IEEE,2007:3737. |
[1] | 杨振,付庄,管恩广,徐建南,田仕禾,郑辉. MLattice模块机器人的运动学分析及构型优化[J]. 上海交通大学学报(自然版), 2017, 51(10): 1153-1159. |
[2] | 赵君1,余海东2. 基于绝对节点坐标法的柔性双臂机构动力学分析[J]. 上海交通大学学报(自然版), 2017, 51(10): 1160-1165. |
[3] | 赵子任1,杜世昌1,黄德林1,任斐2,梁鑫光2. 多工序制造系统暂态阶段产品质量 马尔科夫建模与瓶颈分析[J]. 上海交通大学学报(自然版), 2017, 51(10): 1166-1173. |
[4] | 黄炫圭. 小边概率条件下较小植入团的算法[J]. 上海交通大学学报(自然版), 2017, 51(10): 1202-1206. |
[5] | 罗晶晶a,余海东a,赵春璋a,b,王皓a,b. 基于绝对节点坐标法变截面柔性梁运动稳定性研究[J]. 上海交通大学学报(自然版), 2017, 51(10): 1174-1180. |
[6] | 汪一波1,黄亦翔1,李炳初1,凌晓1,赵帅1,刘成良1,张大庆2. 一种基于静力学预计算的开关磁阻电机模态仿真方法[J]. 上海交通大学学报(自然版), 2017, 51(10): 1181-1188. |
[7] | 周炳海,黎明. 考虑机器人约束加工的制造单元调度方法[J]. 上海交通大学学报(自然版), 2017, 51(10): 1214-1219. |
[8] | 陈进平1,张树生1,何卫平1,王明微1,黄晖2. 基于驱动参数建模的可行更改路径搜索和优选方法[J]. 上海交通大学学报(自然版), 2017, 51(10): 1220-1227. |
[9] | 周鹏辉,马红占,陈东萍,陈梦月,褚学宁. 基于模糊随机故障模式与影响分析的 产品再设计模块识别[J]. 上海交通大学学报(自然版), 2017, 51(10): 1189-1195. |
[10] | 彭程,朱剑昀,陈俐. 基于模型参考控制的混合动力汽车模式切换 [J]. 上海交通大学学报(自然版), 2017, 51(10): 1196-1201. |
[11] | 柳伟,杨超. 基于反向传播神经网络的注塑模具用零件报价模型[J]. 上海交通大学学报(自然版), 2017, 51(10): 1207-1213. |
[12] | 陈苏婷,王卓,王奇. 基于非线性尺度空间的航拍场景分类[J]. 上海交通大学学报(自然版), 2017, 51(10): 1228-1234. |
[13] | 陈宁,贺小滨,桂卫华,阳春华. 基于混沌离散序列的图像加密算法研究[J]. 上海交通大学学报(自然版), 2017, 51(10): 1273-1280. |
[14] | 刘凯a,张立民b,周立军a. 随机受限玻尔兹曼机组设计[J]. 上海交通大学学报(自然版), 2017, 51(10): 1235-1240. |
[15] | 朱信尧1,宋保维2,徐刚1,杨松林1. 支撑机构驻留水下航行器着陆策略及影响因素[J]. 上海交通大学学报(自然版), 2017, 51(10): 1241-1251. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||