J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (3): 510-520.doi: 10.1007/s12204-023-2632-9

• Medicine-Engineering Interdisciplinary • Previous Articles     Next Articles

Deep Learning Framework for Predicting Essential Proteins with Temporal Convolutional Networks

用时间卷积网络预测关键蛋白质的深度学习框架

卢鹏丽1,杨培实1,廖永刚2   

  1. 1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China; 2. China Mobile Communications Group Gansu Co., Ltd., Lanzhou 730070, China
  2. 1. 兰州理工大学 计算机与通信学院, 兰州 730050;2. 中国移动通信集团甘肃有限公司, 兰州 730070
  • Received:2022-05-06 Accepted:2022-09-06 Online:2025-06-06 Published:2025-06-06

Abstract: Essential proteins are an indispensable part of cells and play an extremely significant role in genetic disease diagnosis and drug development. Therefore, the prediction of essential proteins has received extensive attention from researchers. Many centrality methods and machine learning algorithms have been proposed to predict essential proteins. Nevertheless, the topological characteristics learned by the centrality method are not comprehensive enough, resulting in low accuracy. In addition, machine learning algorithms need sufficient prior knowledge to select features, and the ability to solve imbalanced classification problems needs to be further strengthened. These two factors greatly affect the performance of predicting essential proteins. In this paper, we propose a deep learning framework based on temporal convolutional networks to predict essential proteins by integrating gene expression data and protein-protein interaction (PPI) network. We make use of the method of network embedding to automatically learn more abundant features of proteins in the PPI network. For gene expression data, we treat it as sequence data, and use temporal convolutional networks to extract sequence features. Finally, the two types of features are integrated and put into the multi-layer neural network to complete the final classification task. The performance of our method is evaluated by comparing with seven centrality methods, six machine learning algorithms, and two deep learning models. The results of the experiment show that our method is more effective than the comparison methods for predicting essential proteins.

Key words: temporal convolutional networks, node2vec, protein-protein interaction (PPI) network, essential proteins, gene expression data

摘要: 关键蛋白质是细胞不可缺少的组成部分,并且在遗传病诊断和药物开发中发挥着极其重要的作用。因此,关键蛋白质的预测受到了研究人员的广泛关注。许多中心性方法和机器学习算法已经被提出来预测关键蛋白质。然而,中心性方法学习到的拓扑特征不够全面,导致准确率较低。此外,机器学习算法需要足够的先验知识来选择特征,解决不平衡分类问题的能力有待进一步加强。这两个因素极大地影响了预测关键蛋白质的性能。在本文中,我们提出了一种基于时间卷积网络的深度学习框架,通过整合基因表达数据和蛋白质相互作用(PPI)网络来预测关键蛋白质。我们利用网络嵌入的方法自动学习PPI网络中蛋白质更丰富的特征。对于基因表达数据,我们将其视为序列数据,并使用时间卷积网络提取序列特征。最后将两类特征融合在一起,放入多层神经网络中,完成最终的分类任务。通过与7种中心性方法、6种机器学习算法和2种深度学习模型进行比较,评估了我们方法的性能。实验结果表明,我们的方法比预测关键蛋白质的比较方法更有效。

关键词: 时间卷积网络,node2vec,蛋白质相互作用网络,关键蛋白质,基因表达数据

CLC Number: