通过融合多重相似性网络的非负矩阵分解预测circRNA和疾病的关联

doi:10.1007/s12204-024-2575-9

摘要/Abstract

摘要： CircRNA广泛存在于人体内，在调控各种生物过程中发挥着重要作用，并与复杂的人类疾病密切相关。研究circRNAs与疾病之间的潜在关联可以加深我们对疾病的理解，并为疾病的早期诊断、治疗和预防提供新的策略和工具。然而，现有模型在准确捕捉相似性、处理关联网络的稀疏和噪声属性以及从多个角度充分利用生物信息学方面存在局限性。为了解决这些问题，本研究引入了一种新的基于非负矩阵分解的框架，称为NMFMSN。首先，结合 circRNA 序列数据和疾病语义信息，分别计算circRNA和疾病的相似性。鉴于已知 circRNA和疾病之间的关联稀疏，根据相邻 circRNA 和疾病之间的交互作用，通过填补缺失链接来重建网络，以完成更多关联。最后，将这两个相似性网络整合到非负矩阵分解框架中，以确定潜在的circRNA与疾病的关联。在进行5折交叉验证和留一交叉验证后，NMFMSN的AUC值分别达到0.9712和0.9768，优于目前最先进的模型。有关肺癌和肝细胞癌的案例研究表明，NMFMSN 是预测 circRNA 与疾病之间新关联的有效方法。

关键词: circRNA和疾病关联, circRNA序列数据, 疾病语义信息, 非负矩阵分解

Abstract: CircRNAs, widely found throughout the human bodies, play a crucial role in regulating various biological processes and are closely linked to complex human diseases. Investigating potential associations between circRNAs and diseases can enhance our understanding of diseases and provide new strategies and tools for early diagnosis, treatment, and disease prevention. However, existing models have limitations in accurately capturing similarities, handling the sparse and noise attributes of association networks, and fully leveraging bioinformatical aspects from multiple viewpoints. To address these issues, this study introduces a new non-negative matrix factorization-based framework called NMFMSN. First, we incorporate circRNA sequence data and disease semantic information to compute circRNA and disease similarity, respectively. Given the sparse known associations between circRNAs and diseases, we reconstruct the network to complete more associations by imputing missing links based on neighboring circRNA and disease interactions. Finally, we integrate these two similarity networks into a non-negative matrix factorization framework to identify potential circRNA-disease associations. Upon conducting 5-fold cross-validation and leave-one-out cross-validation, the AUC values for NMFMSN reach 0.971 2 and 0.976 8, respectively, outperforming the currently most advanced models. Case studies on lung cancer and hepatocellular carcinoma show that NMFMSN is a good way to predict new associations between circRNAs and diseases.

中图分类号:

TP393
Q51

. 通过融合多重相似性网络的非负矩阵分解预测circRNA和疾病的关联[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 709-719.

Lu Pengli, Li Shiying. Predicting CircRNA-Disease Associations via Non-Negative Matrix Factorization Fused with Multiple Similarity Networks[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(4): 709-719.

参考文献

[1] ZHANG H D, JIANG L H, SUN D W, et al. CircRNA: A novel type of biomarker for cancer [J]. Breast Cancer, 2018, 25(1): 1-7.
[2] PATOP I L, WÜST S, KADENER S. Past, present, and future of circRNAs [J]. The EMBO Journal, 2019, 38(16): e100836.
[3] HANSEN T B, JENSEN T I, CLAUSEN B H, et al. Natural RNA circles function as efficient microRNA sponges [J]. Nature, 2013, 495(7441): 384-388.
[4] HANSEN T B, WIKLUND E D, BRAMSEN J B, et al. miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA [J]. The EMBO Journal, 2011, 30(21): 4414-4422.
[5] CHEN S J, LI T W, ZHAO Q F, et al. Using circular RNA hsa_circ_0000190 as a new biomarker in the diagnosis of gastric cancer [J]. Clinica Chimica Acta, 2017, 466: 167-171.
[6] YAO J T, ZHAO S H, LIU Q P, et al. Over-expression of CircRNA_100876 in non-small cell lung cancer and its prognostic value [J]. Pathology - Research and Practice, 2017, 213(5): 453-456.
[7] SHANG X C, LI G Z, LIU H, et al. Comprehensive circular RNA profiling reveals that hsa_circ_0005075, a new circular RNA biomarker, is involved in hepatocellular crcinoma development [J]. Medicine, 2016, 95(22): e3811.
[8] QIN M L, LIU G, HUO X S, et al. Hsa_circ_0001649: A circular RNA and potential novel biomarker for hepatocellular carcinoma [J]. Cancer Biomarkers, 2016, 16(1): 161-169.
[9] DOU Y C, CHA D J, FRANKLIN J L, et al. Circular RNAs are down-regulated in KRAS mutant colon cancer cells and can be transferred to exosomes [J]. Scientific Reports, 2016, 6: 37982.
[10] TIAN M Q, CHEN R Y, LI T W, et al. Reduced expression of circRNA hsa_circ_0003159 in gastric cancer and its clinical significance [J]. Journal of Clinical Laboratory Analysis, 2018, 32(3): e22281.
[11] WAN L, ZHANG L, FAN K, et al. Circular RNA-ITCH suppresses lung cancer proliferation via inhibiting the Wnt/β-catenin pathway [J]. BioMed Research International, 2016, 2016: 1579490.
[12] SERMAN L, NIKUSEVA MARTIC T, SERMAN A, et al. Epigenetic alterations of the Wnt signaling pathway in cancer: A mini review [J]. Bosnian Journal of Basic Medical Sciences, 2014, 14(4): 191-194.
[13] XIANG Z S, QIN T T, QIN Z S, et al. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks [J]. BMC Systems Biology, 2013, 7(Suppl 3): S9.
[14] FAN C Y, LEI X J, FANG Z Q, et al. CircR2Disease: A manually curated database for experimentally supported circular RNAs associated with various diseases [J]. Database, 2018, 2018: bay044.
[15] GLAŽAR P, PAPAVASILEIOU P, RAJEWSKY N. circBase: A database for circular RNAs [J]. RNA, 2014, 20(11): 1666-1670.
[16] FAN C Y, LEI X J, WU F X. Prediction of CircRNA-disease associations using KATZ model based on heterogeneous networks [J]. International Journal of Biological Sciences, 2018, 14(14): 1950-1959.
[17] ZHENG K, YOU Z H, LI J Q, et al. iCDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation [J]. PLoS Computational Biology, 2020, 16(5): e1007872.
[18] LEI X J, BIAN C. Integrating random walk with restart and k-Nearest Neighbor to identify novel circRNA-disease association [J]. Scientific Reports, 2020, 10(1): 1943.
[19] NIU M T, ZOU Q, WANG C Y. GMNN2CD: Identification of circRNA-disease associations based on variational inference and graph Markov neural networks [J]. Bioinformatics, 2022, 38(8): 2246-2253.
[20] PENG L, YANG C, CHEN Y F, et al. Predicting CircRNA-disease associations via feature convolution learning with heterogeneous graph attention network [J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27(6): 3072-3082.
[21] PENG L, YANG C, HUANG L, et al. RNMFLP: Predicting circRNA-disease associations based on robust nonnegative matrix factorization and label propagation [J]. Briefings in Bioinformatics, 2022, 23(5): bbac155.
[22] LEVENSHTEIN V I. Binary codes capable of correcting deletions, insertions and reversals [J]. Soviet Physics Doklady, 1966, 10: 707.
[23] LIU J X, CUI Z, GAO Y L, et al. WGRCMF: A weighted graph regularized collaborative matrix factorization method for predicting novel LncRNA-disease associations [J]. IEEE Journal of Biomedical and Health Informatics, 2021, 25(1): 257-265.
[24] YU N, LIU Z P, GAO R. Predicting multiple types of microRNA-disease associations based on tensor factorization and label propagation [J]. Computers in Biology and Medicine, 2022, 146: 105558.
[25] LIN L Q, CHEN R B, ZHU Y T, et al. SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA-disease associations [J]. Frontiers in Microbiology, 2023, 13: 1093615.
[26] VAN LAARHOVEN T, NABUURS S B, MARCHIORI E. Gaussian interaction profile kernels for predicting drug-target interaction [J]. Bioinformatics, 2011, 27(21): 3036-3043.
[27] LIAN M J, DU W L, WANG X J, et al. Drug-target interaction prediction based on multi-similarity fusion and sparse dual-graph regularized matrix factorization [J]. IEEE Access, 2021, 9: 9