学报(中文)

基于网络最大流的作者同名区分算法

展开
  • 上海交通大学 电子信息与电气工程学院, 上海 200240
全锦琪(1994-),男,广东省茂名市人,硕士生,主要研究方向为数据挖掘.

网络出版日期: 2020-03-06

A Network Maximum Flow Based Approach for Author Name Disambiguation

Expand
  • School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Online published: 2020-03-06

摘要

为了降低不同学者实体之间的共享特征(如机构、发表会议等)给同名区分带来的影响,提出一种基于网络最大流的同名区分算法.该算法将论文实体及其特征融合成一张网络图,根据特征节点的被共享程度设定不同的容量,再计算论文节点间的最大流量,并基于最大流量进行层次聚类.实验结果表明:该算法在精准率和召回率上有较为均衡的表现,具有较好的综合性能.

本文引用格式

全锦琪,傅洛伊,甘小莺,王新兵 . 基于网络最大流的作者同名区分算法[J]. 上海交通大学学报, 2020 , 54(2) : 111 -116 . DOI: 10.16183/j.cnki.jsjtu.2020.02.001

Abstract

In order to reduce the influence of sharing features (organizations, conferences, etc.) among different author entities on author name disambiguation, an algorithm based on network maximum flow is proposed in this paper. The algorithm puts the paper entities and features into a network graph, and sets the capacity of feature nodes based on the sharing degree. And then, it calculates maximum flow between each paper nodes and does clustering based on maximum flow. The experiment results show that the proposed algorithm has a more balanced performance on accuracy and recall, and has better overall performance.

参考文献

[1]YIN X X, HAN J W, YU P S. Object distinction: Distinguishing objects with identical names[C]//2007 IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey: IEEE, 2007: 1242-1246. [2]TANG J, FONG A C M, WANG B, et al. A unified probabilistic framework for name disambiguation in digital library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 975-987. [3]FAN X, WANG J, PU X, et al. On graph-based name disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): 10. [4]CUI P, WANG X, PEI J, et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(5): 833-852. [5]PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 701-710. [6]TANG J, QU M, WANG M Z, et al. Line: Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: ACM, 2015: 1067-1077. [7]GROVER A, LESKOVEC J. Node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: ACM, 2016: 855-864. [8]ZHANG B C, HASAN M A. Name disambiguation in anonymized graphs using network embedding[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: ACM, 2017: 1239-1248. [9]SUN Y Z, NORICK B, HAN J W, et al. PathSelClus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks[J]. ACM Transactions on Knowledge Discovery from Data, 2013, 7(3): 11. [10]EDMONDS J, KARP R M. Theoretical improvements in algorithmic efficiency for network flow problems[J]. Journal of the Association for Computing Machinery, 1972, 19(2): 248-264. [11]GOMORY R E, HU T C. Multi-terminal network flows[J]. Journal of the Society for Industrial and Applied Mathematics, 1961, 9(4): 551-570. [12]GUSFIELD D. Very simple methods for all pairs network flow analysis[J]. SIAM Journal on Computing, 1990, 19(1): 143-155. [13]WILKS D S. Statistical methods in the atmospheric sciences[M]. 3rd ed. Oxford, UK: Elsevier, 2011: 603-611. [14]SINHA A, SHEN Z H, SONG Y, et al. An overview of microsoft academic service (MAS) and applications[C]//Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: ACM, 2015: 243-246.
文章导航

/