基于变分推断和元路径分解的异质网络表示方法

doi:10.16183/j.cnki.jsjtu.2020.187

上海交通大学学报 ›› 2021, Vol. 55 ›› Issue (5): 586-597.doi: 10.16183/j.cnki.jsjtu.2020.187

所属专题：《上海交通大学学报》2021年12期专题汇总专辑；《上海交通大学学报》2021年“自动化技术、计算机技术”专题

基于变分推断和元路径分解的异质网络表示方法

袁铭, 刘群(), 孙海超, 谭洪胜

重庆邮电大学计算机科学与技术学院, 重庆 400065

收稿日期:2020-06-18 出版日期:2021-05-28 发布日期:2021-06-01
通讯作者: 刘群 E-mail:liuqun@cqupt.edu.cn
作者简介:袁铭(1996-),男,重庆市人,硕士生,主要研究方向为网络表示学习.
基金资助:
国家自然科学基金重点项目(61936001);国家自然科学基金(61772096);国家重点研发计划(2016QY01W0200)

A Heterogeneous Network Representation Method Based on Variational Inference and Meta-Path Decomposition

YUAN Ming, LIU Qun(), SUN Haichao, TAN Hongsheng

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2020-06-18 Online:2021-05-28 Published:2021-06-01
Contact: LIU Qun E-mail:liuqun@cqupt.edu.cn

摘要/Abstract

摘要：

针对异质网络表示中传统元路径随机游走无法准确描述异质网络结构,不能较好地捕捉网络节点内在的真实分布问题,提出基于变分推断和元路径分解的异质网络表示方法HetVAE.该方法先结合路径相似度的思想,设计了一种节点选择策略对元路径随机游走进行改进,再通过引入变分理论对原始分布中的潜在变量进行有效采样.最后,通过设计个性化的注意力机制,对由分解获得的不同子网络的节点向量表示进行加权,再将其进行融合,使最终的节点向量表示具有更丰富的语义信息.通过在DBLP、AMiner、Yelp 这3个真实数据集上进行多组不同网络任务的实验,验证了模型的有效性.在节点分类和节点聚类任务上,与对比算法相比,微观F₁值和标准化互信息分别提升了1.12%~4.36%和1.35%~18%,表明HetVAE能够有效地表征异质网络结构,学习出更符合真实分布的节点向量表示.

关键词: 异质网络, 网络表示, 变分自编码器, 随机游走, 注意力机制

Abstract:

Aimed at the problem that the traditional meta-path random walk in heterogeneous network representation cannot accurately describe the heterogeneous network structure and cannot capture the true distribution of network nodes well, a heterogeneous network representation method based on variational inference and meta-path decomposition is proposed, which is named HetVAE. First, combining with the idea of path similarity, a node selection strategy is designed to improve the random walk of the meta-path. Next, the variational theory is introduced to effectively sample the latent variables in the original distribution. After that, a personalized attention machanism is implemented, which weights the node vector representation of different sub-networks obtained by decomposition. Then, these node vectors are fused by the proposed model, so that the final node vector representation can have richer semantic information. Finally, several experiments on different network tasks are performed on the three real data sets of DBLP, AMiner, and Yelp. The effectiveness of the model is verified by these results. In node classification and node clustering tasks, compared with some state-of-the-art algorithms, the Micro-F₁ and normalized mutual information (NMI) increase by 1.12% to 4.36% and 1.35% to 18% respectively. It is proved that HetVAE can effectively capture the heterogeneous network structure and learn the node vetcor representation that conforms more with the true distribution.

Key words: heterogeneous network, network representation, variational autoencoder, random walk, attention mechanism

中图分类号:

TP181

袁铭, 刘群, 孙海超, 谭洪胜. 基于变分推断和元路径分解的异质网络表示方法[J]. 上海交通大学学报, 2021, 55(5): 586-597.

YUAN Ming, LIU Qun, SUN Haichao, TAN Hongsheng. A Heterogeneous Network Representation Method Based on Variational Inference and Meta-Path Decomposition[J]. Journal of Shanghai Jiao Tong University, 2021, 55(5): 586-597.

图/表 9

图1

图2

图3

图4

表1

图5

表2

图6

图7

参考文献 22

[1]	CUI P, WANG X, PEI J, et al. A survey on network embedding[EB/OL]. (2017-11-23)[2019-12-22]. https://arxiv.org/abs/1711.08752
[2]	涂存超, 杨成, 刘知远, 等. 网络表示学习综述[J]. 中国科学: 信息科学, 2017, 47(8):980-996.
	TU Cunchao, YANG Cheng, LIU Zhiyuan, et al. Network representation learning: An overview[J]. Scientia Sinica (Informationis), 2017, 47(8):980-996.
[3]	PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining-KDD' 14 New York, NY, USA: ACM Press, 2014: 701-710.
[4]	GROVER A, LESKOVEC J. Node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM Press, 2016: 855-864.
[5]	ZHU D Y, CUI P, WANG D X, et al. Deep variational network embedding in Wasserstein space[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY,USA: ACM Press, 2018: 2827-2836.
[6]	TANG J, QU M, WANG M Z, et al. LINE: Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web-WWW '15 New York, NY, USA: ACM Press, 2015: 1067-1077.
[7]	FU T Y, LEE W C, LEI Z. HIN2Vec: Explore meta-paths in heterogeneous information networks for representation learning[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management New York, NY, USA: ACM Press, 2017: 1797-1806.
[8]	DONG Y X, CHAWLA N V, SWAMI A. Metapath2vec: Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: SACM Press, 2017: 135-144.
[9]	TANG J, QU M, MEI Q Z. PTE: Predictive text embedding through large-scale heterogeneous text networks[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-KDD '15. New York, NY, USA: ACM Press, 2015: 1165-1174.
[10]	XU L C, WEI X K, CAO J N, et al. Embedding of embedding (EOE): Joint embedding for coupled heterogeneous networks[C]//Proceedings of the Tenth ACM International Conference on Web Search and Data Mining-WSDM '17. New York, NY, USA: ACM Press, 2017: 741-749.
[11]	CHANG S Y, HAN W, TANG J L, et al. Heterogeneous network embedding via deep architectures[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-KDD '15. New York, NY, USA: ACM Press, 2015: 119-128.
[12]	WANG H W, ZHANG F Z, HOU M, et al. SHINE: Signed heterogeneous information network embedding for sentiment link prediction[C]//Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining-WSDM '18 New York, NY, USA: ACM Press, 2018: 592-600.
[13]	QU M, TANG J, HAN J W. Curriculum learning for heterogeneous star network embedding via deep reinforcement learning[C]//Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining-WSDM '18. New York, NY, USA: ACM Press, 2018: 468-476.
[14]	WANG X, JI H Y, SHI C, et al. Heterogeneous graph attention network[C]//The World Wide Web Conference. New York, NY, USA: ACM Press, 2019: 2022-2032.
[15]	ZHANG C X, SONG D J, HUANG C, et al. Heterogeneous graph neural network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: ACM Press, 2019: 793-803.
[16]	CEN Y K, ZOU X, ZHANG J W, et al. Representation learning for attributed multiplex heterogeneous network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: ACM Press, 2019: 1358-1368.
[17]	HU B B, ZHANG Z Q, SHI C, et al. Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33:946-953. doi: 10.1609/aaai.v33i01.3301946 URL
[18]	SHI C, LI Y T, ZHANG J W, et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1):17-37. doi: 10.1109/TKDE.2016.2598561 URL
[19]	SUN Y, HAN J, YAN X, et al. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2011, 4(11):992-1003. doi: 10.14778/3402707.3402736 URL
[20]	KINGMA D P, WELLING M. Auto-encoding variational bayes[EB/OL].(2014-05-01) [2019-12-22]. https://arxiv.org/abs/1312.6114 .
[21]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2014-05-01) [2019-12-22]. https://arxiv.org/abs/1706.03762 .
[22]	SHI C, HU B B, ZHAO W X, et al. Heterogeneous information network embedding for recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(2):357-370. doi: 10.1109/TKDE.2018.2833443 URL

数据集	链边关系 (A-B)	A类型节点的数量	B类型节点的数量	A和B链边关系的数量	标签数量	标签类别	元路径	A类型节点平均度	B类型节点平均度	网络平均度
DBLP	Pa-A	14376	14475	41794	4057	4	APaA	11.88	2.89	4.73
	Pa-C	14376	20	14376			APaCPaA		718.8
	Pa-T	14376	8920	114624			APaTPaA		12.85
AMiner	Pa-A	13978	16543	52957	9726	8	APaA	4.79	3.20	2.05
	Pa-C	13978	2152	13978			APaCPaA		6.49
Yelp	Bu-S	2614	2	2614	2614	3	BuSBu	13.79	1370.0	9.22
	Bu-St	2614	8	2614			BuStBu		326.75
	Bu-U	2614	1286	30838			BuUBu		23.98

算法	DBLP		AMiner		Yelp
算法	NMI	ARI	NMI	ARI	NMI	ARI
Deepwalk	0.5841	0.4960	0.3160	0.2227	0.2940	0.3179
Node2vec	0.5401	0.4776	0.3081	0.2219	0.0105	0.0111
HIN2vec	0.0124	0.0106	0.1670	0.0758	0.1353	0.1708
Metapath2vec	0.6395	0.6369	0.2645	0.2083	0.3540	0.4047
HERec	0.6844	0.7104	0.3230	0.2322	0.3511	0.4018
HAN	0.5987	0.5929	0.0375	0.0165	0.3635	0.4255
HetVAE_rw	0.7742	0.8329	0.3324	0.2234	0.3603	0.4016
HetVAE_sk	0.8173	0.8664	0.3446	0.2321	0.3593	0.4097
HetVAE_con	0.7826	0.8351	0.3239	0.2660	0.3416	0.3917
HetVAE	0.8540	0.9016	0.4025	0.3798	0.3761	0.4399

基于变分推断和元路径分解的异质网络表示方法

A Heterogeneous Network Representation Method Based on Variational Inference and Meta-Path Decomposition

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 22

相关文章 2

编辑推荐

Metrics

本文评价

[1]	蔡云泽, 张彦军. 基于双通道特征增强集成注意力网络的红外弱小目标检测方法[J]. 空天防御, 2021, 4(4): 14-22.
[2]	张靖宜, 贺光辉, 代洲, 刘亚东. 融入BERT的企业年报命名实体识别方法[J]. 上海交通大学学报, 2021, 55(2): 117-123.