上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (5): 586-597.doi: 10.16183/j.cnki.jsjtu.2020.187

• 学报(中文) • 上一篇    下一篇

基于变分推断和元路径分解的异质网络表示方法

袁铭,刘群,孙海超,谭洪胜   

  1. 重庆邮电大学 计算机科学与技术学院, 重庆 400065
  • 收稿日期:2020-06-18 出版日期:2021-05-28 发布日期:2021-06-01
  • 通讯作者: 刘群,女,教授,博士生导师,电话(Tel.):13908322889;E-mail:liuqun@cqupt.edu.cn.
  • 作者简介:袁铭(1996-),男,重庆市人,硕士生,主要研究方向为网络表示学习.
  • 基金资助:
    国家自然科学基金重点项目(61936001),国家自然科学基金(61772096),国家重点研发计划(2016QY01W0200)

A Heterogeneous Network Representation Method Based on Variational Inference and Meta-Path Decomposition

YUAN Ming,LIU Qun,SUN Haichao,TAN Hongsheng   

  1. College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2020-06-18 Online:2021-05-28 Published:2021-06-01

摘要: 针对异质网络表示中传统元路径随机游走无法准确描述异质网络结构,不能较好地捕捉网络节点内在的真实分布问题,提出基于变分推断和元路径分解的异质网络表示方法HetVAE.该方法先结合路径相似度的思想,设计了一种节点选择策略对元路径随机游走进行改进,再通过引入变分理论对原始分布中的潜在变量进行有效采样.最后,通过设计个性化的注意力机制,对由分解获得的不同子网络的节点向量表示进行加权,再将其进行融合,使最终的节点向量表示具有更丰富的语义信息.通过在DBLP、AMiner、Yelp 这3个真实数据集上进行多组不同网络任务的实验,验证了模型的有效性.在节点分类和节点聚类任务上,与对比算法相比,微观F1值和标准化互信息分别提升了1.12%~4.36%和1.35%~18%,表明HetVAE能够有效地表征异质网络结构,学习出更符合真实分布的节点向量表示.

关键词: 异质网络, 网络表示, 变分自编码器, 随机游走, 注意力机制

Abstract: Aimed at the problem that the traditional meta-path random walk in heterogeneous network representation cannot accurately describe the heterogeneous network structure and cannot capture the true distribution of network nodes well, a heterogeneous network representation method based on variational inference and meta-path decomposition is proposed, which is named HetVAE. First, combining with the idea of path similarity, a node selection strategy is designed to improve the random walk of the meta-path. Next, the variational theory is introduced to effectively sample the latent variables in the original distribution. After that, a personalized attention machanism is implemented, which weights the node vector representation of different sub-networks obtained by decomposition. Then, these node vectors are fused by the proposed model, so that the final node vector representation can have richer semantic information. Finally, several experiments on different network tasks are performed on the three real data sets of DBLP, AMiner, and Yelp. The effectiveness of the model is verified by these results. In node classification and node clustering tasks, compared with some state-of-the-art algorithms, the Micro-F1 and normalized mutual information (NMI) increase by 1.12% to 4.36% and 1.35% to 18% respectively. It is proved that HetVAE can effectively capture the heterogeneous network structure and learn the node vetcor representation that conforms more with the true distribution.

Key words: heterogeneous network, network representation, variational autoencoder, random walk, attention mechanism

中图分类号: