上海交通大学学报 ›› 2020, Vol. 54 ›› Issue (2): 111-116.doi: 10.16183/j.cnki.jsjtu.2020.02.001

• 学报(中文) •    下一篇

基于网络最大流的作者同名区分算法

全锦琪,傅洛伊,甘小莺,王新兵   

  1. 上海交通大学 电子信息与电气工程学院, 上海 200240
  • 发布日期:2020-03-06
  • 通讯作者: 王新兵,男,教授,博士生导师,E-mail:xwang8@sjtu.edu.cn.
  • 作者简介:全锦琪(1994-),男,广东省茂名市人,硕士生,主要研究方向为数据挖掘.

A Network Maximum Flow Based Approach for Author Name Disambiguation

QUAN Jinqi,FU Luoyi,GAN Xiaoying,WANG Xinbing   

  1. School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Published:2020-03-06

摘要: 为了降低不同学者实体之间的共享特征(如机构、发表会议等)给同名区分带来的影响,提出一种基于网络最大流的同名区分算法.该算法将论文实体及其特征融合成一张网络图,根据特征节点的被共享程度设定不同的容量,再计算论文节点间的最大流量,并基于最大流量进行层次聚类.实验结果表明:该算法在精准率和召回率上有较为均衡的表现,具有较好的综合性能.

关键词: 同名区分; 最大流; 聚类; 学术网络

Abstract: In order to reduce the influence of sharing features (organizations, conferences, etc.) among different author entities on author name disambiguation, an algorithm based on network maximum flow is proposed in this paper. The algorithm puts the paper entities and features into a network graph, and sets the capacity of feature nodes based on the sharing degree. And then, it calculates maximum flow between each paper nodes and does clustering based on maximum flow. The experiment results show that the proposed algorithm has a more balanced performance on accuracy and recall, and has better overall performance.

Key words: name disambiguation; maximum flow; clustering; academic network

中图分类号: