A Network Maximum Flow Based Approach for Author Name Disambiguation

Expand
  • School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Online published: 2020-03-06

Abstract

In order to reduce the influence of sharing features (organizations, conferences, etc.) among different author entities on author name disambiguation, an algorithm based on network maximum flow is proposed in this paper. The algorithm puts the paper entities and features into a network graph, and sets the capacity of feature nodes based on the sharing degree. And then, it calculates maximum flow between each paper nodes and does clustering based on maximum flow. The experiment results show that the proposed algorithm has a more balanced performance on accuracy and recall, and has better overall performance.

Cite this article

QUAN Jinqi,FU Luoyi,GAN Xiaoying,WANG Xinbing . A Network Maximum Flow Based Approach for Author Name Disambiguation[J]. Journal of Shanghai Jiaotong University, 2020 , 54(2) : 111 -116 . DOI: 10.16183/j.cnki.jsjtu.2020.02.001

References

[1]YIN X X, HAN J W, YU P S. Object distinction: Distinguishing objects with identical names[C]//2007 IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey: IEEE, 2007: 1242-1246. [2]TANG J, FONG A C M, WANG B, et al. A unified probabilistic framework for name disambiguation in digital library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 975-987. [3]FAN X, WANG J, PU X, et al. On graph-based name disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): 10. [4]CUI P, WANG X, PEI J, et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(5): 833-852. [5]PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 701-710. [6]TANG J, QU M, WANG M Z, et al. Line: Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: ACM, 2015: 1067-1077. [7]GROVER A, LESKOVEC J. Node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: ACM, 2016: 855-864. [8]ZHANG B C, HASAN M A. Name disambiguation in anonymized graphs using network embedding[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: ACM, 2017: 1239-1248. [9]SUN Y Z, NORICK B, HAN J W, et al. PathSelClus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks[J]. ACM Transactions on Knowledge Discovery from Data, 2013, 7(3): 11. [10]EDMONDS J, KARP R M. Theoretical improvements in algorithmic efficiency for network flow problems[J]. Journal of the Association for Computing Machinery, 1972, 19(2): 248-264. [11]GOMORY R E, HU T C. Multi-terminal network flows[J]. Journal of the Society for Industrial and Applied Mathematics, 1961, 9(4): 551-570. [12]GUSFIELD D. Very simple methods for all pairs network flow analysis[J]. SIAM Journal on Computing, 1990, 19(1): 143-155. [13]WILKS D S. Statistical methods in the atmospheric sciences[M]. 3rd ed. Oxford, UK: Elsevier, 2011: 603-611. [14]SINHA A, SHEN Z H, SONG Y, et al. An overview of microsoft academic service (MAS) and applications[C]//Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: ACM, 2015: 243-246.
Outlines

/