J Shanghai Jiaotong Univ Sci ›› 2025, Vol. 30 ›› Issue (4): 709-719.doi: 10.1007/s12204-024-2575-9

• • 上一篇    下一篇

通过融合多重相似性网络的非负矩阵分解预测circRNA和疾病的关联

  

  1. 兰州理工大学 计算机与通信学院,兰州730050
  • 收稿日期:2023-09-21 接受日期:2023-11-26 发布日期:2025-07-31

Predicting CircRNA-Disease Associations via Non-Negative Matrix Factorization Fused with Multiple Similarity Networks

卢鹏丽,李十莹   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Received:2023-09-21 Accepted:2023-11-26 Published:2025-07-31

摘要: CircRNA广泛存在于人体内,在调控各种生物过程中发挥着重要作用,并与复杂的人类疾病密切相关。研究circRNAs与疾病之间的潜在关联可以加深我们对疾病的理解,并为疾病的早期诊断、治疗和预防提供新的策略和工具。然而,现有模型在准确捕捉相似性、处理关联网络的稀疏和噪声属性以及从多个角度充分利用生物信息学方面存在局限性。为了解决这些问题,本研究引入了一种新的基于非负矩阵分解的框架,称为NMFMSN。首先,结合 circRNA 序列数据和疾病语义信息,分别计算circRNA和疾病的相似性。鉴于已知 circRNA和疾病之间的关联稀疏,根据相邻 circRNA 和疾病之间的交互作用,通过填补缺失链接来重建网络,以完成更多关联。最后,将这两个相似性网络整合到非负矩阵分解框架中,以确定潜在的circRNA与疾病的关联。在进行5折交叉验证和留一交叉验证后,NMFMSN的AUC值分别达到0.9712和0.9768,优于目前最先进的模型。有关肺癌和肝细胞癌的案例研究表明,NMFMSN 是预测 circRNA 与疾病之间新关联的有效方法。

关键词: circRNA和疾病关联,circRNA序列数据,疾病语义信息,非负矩阵分解

Abstract: CircRNAs, widely found throughout the human bodies, play a crucial role in regulating various biological processes and are closely linked to complex human diseases. Investigating potential associations between circRNAs and diseases can enhance our understanding of diseases and provide new strategies and tools for early diagnosis, treatment, and disease prevention. However, existing models have limitations in accurately capturing similarities, handling the sparse and noise attributes of association networks, and fully leveraging bioinformatical aspects from multiple viewpoints. To address these issues, this study introduces a new non-negative matrix factorization-based framework called NMFMSN. First, we incorporate circRNA sequence data and disease semantic information to compute circRNA and disease similarity, respectively. Given the sparse known associations between circRNAs and diseases, we reconstruct the network to complete more associations by imputing missing links based on neighboring circRNA and disease interactions. Finally, we integrate these two similarity networks into a non-negative matrix factorization framework to identify potential circRNA-disease associations. Upon conducting 5-fold cross-validation and leave-one-out cross-validation, the AUC values for NMFMSN reach 0.971 2 and 0.976 8, respectively, outperforming the currently most advanced models. Case studies on lung cancer and hepatocellular carcinoma show that NMFMSN is a good way to predict new associations between circRNAs and diseases.

Key words: circRNA-disease associations, circRNA sequence data, disease semantic information, non-negative matrix factorization

中图分类号: