Medicine-Engineering Interdisciplinary

Deep Learning Framework for Predicting Essential Proteins with Temporal Convolutional Networks

Expand
  • 1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China; 2. China Mobile Communications Group Gansu Co., Ltd., Lanzhou 730070, China

Received date: 2022-05-06

  Accepted date: 2022-09-06

  Online published: 2025-06-06

Abstract

Essential proteins are an indispensable part of cells and play an extremely significant role in genetic disease diagnosis and drug development. Therefore, the prediction of essential proteins has received extensive attention from researchers. Many centrality methods and machine learning algorithms have been proposed to predict essential proteins. Nevertheless, the topological characteristics learned by the centrality method are not comprehensive enough, resulting in low accuracy. In addition, machine learning algorithms need sufficient prior knowledge to select features, and the ability to solve imbalanced classification problems needs to be further strengthened. These two factors greatly affect the performance of predicting essential proteins. In this paper, we propose a deep learning framework based on temporal convolutional networks to predict essential proteins by integrating gene expression data and protein-protein interaction (PPI) network. We make use of the method of network embedding to automatically learn more abundant features of proteins in the PPI network. For gene expression data, we treat it as sequence data, and use temporal convolutional networks to extract sequence features. Finally, the two types of features are integrated and put into the multi-layer neural network to complete the final classification task. The performance of our method is evaluated by comparing with seven centrality methods, six machine learning algorithms, and two deep learning models. The results of the experiment show that our method is more effective than the comparison methods for predicting essential proteins.

Cite this article

Lu Pengli, Yang Peishi, Liao Yonggang . Deep Learning Framework for Predicting Essential Proteins with Temporal Convolutional Networks[J]. Journal of Shanghai Jiaotong University(Science), 2025 , 30(3) : 510 -520 . DOI: 10.1007/s12204-023-2632-9

References

[1] CLATWORTHY A E, PIERSON E, HUNG D T. Targeting  virulence: A new paradigm for antimicrobial  therapy [J]. Nature Chemical Biology, 2007, 3(9): 541- 548. 

[2] GIAEVER G, CHU A M, NI L, et al. Functional profiling  of the saccharomyces cerevisiae genome [J]. Nature, 2002, 418(6896): 387-391. 

[3] CULLEN L M, ARNDT G M. Genome-wide screening  for gene function using RNAi in mammalian cells [J]. Immunology and Cell Biology, 2005, 83(3): 217-223. 

[4] ROEMER T, JIANG B, DAVISON J, et al. Large-scale  essential gene identification in Candida albicans and  applications to antifungal drug discovery [J]. Molecular Microbiology, 2003, 50(1): 167-181. 

[5] FREEMAN L C. Centrality in social networks conceptual  clarification [J]. Social Networks, 1978, 1(3): 215-239. 

[6] JOY M P, BROCK A, INGBER D E, et al. Highbetweenness  proteins in the yeast protein interaction  network [J]. Journal of Biomedicine and Biotechnology, 2005, 2005(2): 96-103. 

[7] WUCHTY S, STADLER P F. Centers of complex networks [J]. Journal of Theoretical Biology, 2003, 223(1): 45-53. 

[8] ESTRADA E, RODR´IGUEZ-VEL´AZQUEZ J A. Subgraph  centrality in complex networks [J]. Physical Review E, 2005, 71(5): 056103. 

[9] WANG J X, LI M, WANG H, et al. Identification of  essential proteins based on edge clustering coefficient [J]. IEEE/ACM Transactions on Computational Biology  and Bioinformatics, 2012, 9(4): 1070-1080. 

[10] LI M, WANG J X, CHEN X, et al. A local average  connectivity-based method for identifying essential  proteins from the network level [J]. Computational Biology and Chemistry, 2011, 35(3): 143-150. 

[11] ZHANG C L, ZHANG S W. A supervised orthogonal  discriminant projection for tumor classification using  gene expression data [J]. Computers in Biology and Medicine, 2013, 43(5): 568-575. 

[12] WANG J S, WANG N, GE F, et al. Gene expression  data classification using Laplacian eigenmap based  on improved maximum margin criterion [J]. Chinese Journal of Electronics, 2013, 22(3): 521-524. 

[13] TANG X W, WANG J X, ZHONG J C, et al. Predicting  essential proteins based on weighted degree  centrality [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014, 11(2): 407- 418. 

[14] YUGANDHAR K, GROMIHA M M. Feature selection  and classification of protein-protein complexes based  on their binding affinities using machine learning approaches [J]. Proteins: Structure, Function, and Bioinformatics, 2014, 82(9): 2088-2096. 

[15] LUO J W, QI Y. Identification of essential proteins  based on a new combination of local interaction density  and protein complexes [J]. PLoS One, 2015, 10(6):  e0131418. 

[16] LI M, LU Y, NIU Z B, et al. United complex centrality  for identification of essential proteins from PPI networks [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(2): 370-380. 

[17] LI M, ZHANG H H,WANG J X, et al. A new essential  protein discovery method based on the integration of  protein-protein interaction and gene expression data [J]. BMC Systems Biology, 2012, 6(1): 1-9.

[18] XIAO Q H, WANG J X, PENG X Q, et al. Identifying  essential proteins from active PPI networks constructed  with dynamic gene expression [J]. BMC Genomics, 2015, 16(Suppl 3): S1. 

[19] HWANG Y C, LIN C C, CHANG J Y, et al. Predicting  essential genes based on network and sequence analysis [J]. Molecular BioSystems, 2009, 5(12): 1672-1678. 

[20] ACENCIO M L, LEMKE N. Towards the prediction  of essential genes by integration of network topology,  cellular localization and biological process information [J]. BMC Bioinformatics, 2009, 10(1): 1-18. 

[21] ZENG M, LI M, WU F X, et al. DeepEP: A deep  learning framework for identifying essential proteins [J]. BMC Bioinformatics, 2019, 20(Suppl 16): 506. 

[22] ZHANG X E, XIAOW X, XIAOW J. DeepHE: Accurately  predicting human essential genes based on deep  learning [J]. PLoS Computational Biology, 2020, 16(9):  e1008229. 

[23] GROVER A, LESKOVEC J. Node2vec: Scalable feature  learning for networks [C]//22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016: 855-864. 

[24] BAI S J, KOLTER J Z, KOLTUN V. An empirical  evaluation of generic convolutional and recurrent networks  for sequence modeling [DB/OL]. (2018-03-04). https://arxiv.org/abs/1803.01271 

[25] NAIR V, HINTON G E. Rectified linear units improve  restricted boltzmann machines Vinod Nair [C]//27th International Conference on Machine Learning. Haifa: ICML, 2010: 807-814. 

[26] IOFFE S, SZEGEDY C. Batch normalization: Accelerating  deep network training by reducing internal  covariate shift [C]//32nd International Conference on International Conference on Machine Learning. New York: ACM, 2015: 448-456. 

[27] ZAHIDI Y, EL YOUNOUSSI Y, AZROUMAHLI C. Comparative study of the most useful Arabicsupporting  natural language processing and deep  learning libraries [C]//2019 5th International Conference  on Optimization and Applications. Kenitra: IEEE, 2019: 1-10. 

[28] BUNDY A, WALLEN L. Breadth-first search [M]//Catalogue of artificial intelligence tools. Berlin, Heidelberg: Springer, 1984: 13. 

[29] TARJAN R. Depth-first search and linear graph algorithms [J]. SIAM Journal on Computing, 1972, 1(2): 146-160. 

[30] CHATR-ARYAMONTRI A, OUGHTRED R, BOUCHER L, et al. The BioGRID interaction  database: 2017 update [J]. Nucleic Acids Research, 2017, 45(D1): D369-D379. 

[31] MEWES H W, FRISHMAN D, G¨ULDENR U, et al. MIPS: a database for genomes and protein sequences [J]. Nucleic Acids Research, 2002, 30(1): 31-34. 

[32] CHERRY J M, ADLER C, BALL C, et al. SGD: Saccharomyces  genome database [J]. Nucleic Acids Research, 1998, 26(1): 73-79. 

[33] ZHANG R, LIN Y. DEG 5.0, a database of essential  genes in both prokaryotes and eukaryotes [J]. Nucleic Acids Research, 2009, 37(suppl 1): D455-D458. 

[34] WINZELER E A, SHOEMAKER D D, ASTROMOFF A, et al. Functional characterization of the S. cerevisiae  genome by gene deletion and parallel analysis [J]. Science, 1999, 285(5429): 901-906. 

[35] TU B P, KUDLICKI A, ROWICKA M, et al. Logic  of the yeast metabolic cycle: Temporal compartmentalization  of cellular processes [J]. Science, 2005, 310(5751): 1152-1158. 

[36] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn: Machine learning in  python [DB/OL]. (2012-01-02). https://arxiv.org/  abs/1201.0490 

[37] ZENG M, LI M, FEI Z H, et al. A deep learning  framework for identifying essential proteins by integrating  multiple types of biological information [J]. IEEE/ACM Transactions on Computational Biology  and Bioinformatics, 2021, 18(1): 296-305.

Outlines

/