基于距离置信度分数的多模态融合分类网络

郑德重, 杨媛媛, 黄浩哲, 谢哲, 李文涛

doi:10.16183/j.cnki.jsjtu.2020.186

上海交通大学学报 >

2022 , Vol. 56 >Issue 1: 89 - 100

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2020.186

基于距离置信度分数的多模态融合分类网络

展开

1.中国科学院上海技术物理研究所医学影像信息学实验室, 上海 200080
2.中国科学院大学,北京 100049
3.复旦大学附属肿瘤医院, 上海 200032

郑德重(1990-),男,湖北省武汉市人,博士生,主要研究方向为机器学习、深度学习在医学影像方面的应用.

收稿日期: 2020-06-18

网络出版日期: 2022-01-21

基金资助

人工智能医学信息系统软件临床试验技术规范资助项目(2019YFC0118805)

收起

Multimodal Fusion Classification Network Based on Distance Confidence Score

Expand

1. Laboratory for Medical Imaging Informatics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200080, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
3. Fudan University Shanghai Cancer Center, Shanghai 200032, China

Received date: 2020-06-18

Online published: 2022-01-21

Fold

摘要

使用多模态数据建模可以有效地克服单一模态信息量不足的问题,大大提高模型的性能.但在量化神经网络模型置信度,尤其是对于多模态融合模型方面并没有很多进展.基于此,提出一种基于嵌入的方法,在嵌入空间中通过计算样本间的距离进行局部密度估计,进而计算模型的置信度分数.该方法具备可扩展性,不仅可以用于单一模态模型,还可以用于多模态融合模型置信度的度量.此外,所提方法还可以用来评估和量化不同模态数据对于多模态融合模型的影响程度.

关键词： 置信度; 多模态融合; 神经网络

本文引用格式

郑德重, 杨媛媛, 黄浩哲, 谢哲, 李文涛 . 基于距离置信度分数的多模态融合分类网络[J]. 上海交通大学学报, 2022 , 56(1) : 89 -100 . DOI: 10.16183/j.cnki.jsjtu.2020.186

Abstract

Multimodal data modeling can effectively overcome the problem of insufficient information in a single mode and can greatly improve the performance of model. However, not much progress has been made in quantifying the confidence of neural network models, especially for multimodal fusion models. This paper proposes a method based on embedding, which calculates the local density estimation in the embedding space by calculating the distance between samples, and then calculates the confidence score of the model. The proposed method is scalable and can be used not only for a single modal model, but also for the confidence measurement of multimodal fusion model. In addition, it can also evaluate and quantify the influences of different modal data on the multimodal fusion model.

Key words： confidence; multimodal fusion; neural network

参考文献

[1]	MANDELBAUM A, WEINSHALL D. Distance-based confidence score for neural network classifiers[EB/OL]. (2017-09-02) [2020-04-16]. https://www.researchgate.net/publication/320097900_Distance-based_Confidence_Score_for_Neural_Network_Classifiers.
[2]	SHAKED A, WOLF L. Improved stereo matching with constant highway networks and reflective confidence learning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017: 6901-6910.
[3]	ZHANG Y B, ZHANG Z F, MIAO D Q, et al. Three-way enhanced convolutional neural networks for sentence-level sentiment classification[J]. Information Sciences, 2019, 477:55-64.
[4]	DEL-AGUA M Á, GIMÉNEZ A, SANCHIS A, et al. Speaker-adapted confidence measures for ASR using deep bidirectional recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(7):1198-1206.
[5]	NADEEM U, BENNAMOUN M, SOHEL F, et al. Learning-based confidence estimation for multi-modal classifier fusion[C]// International Conference on Neural Information Processing. Sydney, NSW, Australia: ICONIP, 2019: 299-312.
[6]	BERTHON A, HAN B, NIU G, et al. Confidence scores make instance-dependent label-noise learning possible[EB/OL]. (2020-01-12)[2020-04-16]. https://www.researchgate.net/publication/338570353_Confidence_Scores_Make_Instance-dependent_Label-noise_Learning_Possible.
[7]	SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL].(2013-12-12)[2020-04-16]. https://www.researchgate.net/publication/259440613_Intriguing_properties_of_neural_networks.
[8]	GAL Y, GHAHRAMANI Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning[J]. Proceedings of Machine Learning Research, 2016, 48:1050-1059.
[9]	GUO C, PLEISS G, SUN Y, et al. On calibration of modern neural networks[J]. Proceedings of Machine Learning Research, 2017, 70:1321-1330.
[10]	DENG B, JIA S, SHI D M. Deep metric learning-based feature embedding for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(2):1422-1435.
[11]	CAO S C, WANG X F, KITANI K M. Learnable embedding space for efficient neural architecture compression[EB/OL].(2019-02-12) [2020-04-16]. https://www.researchgate.net/publication/330845079_Learnable_Embedding_Space_for_Efficient_Neural_Architecture_Compression.
[12]	HU H X, ZHOU G T, DENG Z W, et al. Learning structured inference neural networks with label relations[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016: 2960-2968.
[13]	OZDEMIR O, WOODWARD B, BERLIN A A. Propagating uncertainty in multi-stage Bayesian convolutional neural networks with application to pulmonary nodule detection[EB/OL].(2019-02-12)[2020-04-16]. https://www.researchgate.net/publication/321511200_Propagating_Uncertainty_in_Multi-Stage_Bayesian_Convolutional_Neural_Networks_with_Application_to_Pulmonary_Nodule_Detection.
[14]	NEAL R M. Bayesian learning for neural networks[M]. New York, NY, USA: Springer, 1996.
[15]	徐磊, 李向阳, 黄湘岳. 基于贝叶斯网络的非常规突发事件灾情评估[J]. 上海交通大学学报, 2013, 47(5):846-850.
[15]	XU Lei, LI Xiangyang, HUANG Xiangyue. Unconventional emergency assessment based on Bayesian network[J]. Journal of Shanghai Jiao Tong University, 2013, 47(5):846-850.
[16]	GAL Y, GHAHRAMANI Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, NY, USA:SIGCHI, 2016,: 1050-1059.
[17]	LAKSHMINARAYANAN B, PRITZEL A, BLUNDELL C. Simple andscalable predictive uncertainty estimation using deep ensembles[EB/OL].(2016-12-12)[2020-04-16]. https://www.researchgate.net/publication/311430201_Simple_and_Scalable_Predictive_Uncertainty_Estimation_using_Deep_Ensembles.
[18]	DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 2009: 10836047.
[19]	NOJAVANASGHARI B, GOPINATH D, KOUSHIK J, et al. Deep multimodal fusion for persuasiveness prediction[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. Tokyo, Japan: SIGCHI, 2016: 284-288.
[20]	BALTRUŠAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2):423-443.
[21]	ZARAGOZA H, BUC D. Confidence measures for neural network classifiers[EB/OL]. [2020-04-16]. https://www.semanticscholar.org/paper/Confidence-Measures-for-Neural-Network-Classifiers-Zaragoza-Buc/f19e6e8a06cba5fc8cf234881419de9193bba9d0.
[22]	GHOSH P, DAVIS L S. Understanding center loss based network for image retrieval with few training data[C]// Computer Vision-European Conference on Computer Vision 2018 Workshops. Munich, Germany: Springer, 2018: 717-722.
[23]	WEN Y D, ZHANG K P, LI Z F, et al. A discriminative feature learning approach for deep face recognition[C]// Computer Vision-European Conference on Computer Vision 2016 Workshops. Amsterdam, The Netherlands: Springer, 2016: 499-515.
[24]	SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]// Proceedings of the 31 st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: SIGCHI, 2017: 4080-4090.
[25]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL].(2017-06-12)[2020-04-16]. https://arxiv.org/abs/1706.03762.
[26]	WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE, 2017: 6450-6458.
[27]	ZHANG H, GOODFELLOW I, METAXAS D, et al. Self-attention generative adversarial networks[J]. Proceedings of Machine Learning Research, 2019, 97:7354-7363.
[28]	RAMACHANDRAN P, PARMAR N, VASWANI A, et al. Stand-alone self-attention in vision models[EB/OL].(2019-06-28)[2020-04-16]. https://www.researchgate.net/publication/333815334_Stand-Alone_Self-Attention_in_Vision_Models.
[29]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]// Computer Vision-European Conference on Computer Vision 2018 Workshops. Munich, Germany: Springer, 2018: 3-19.
[30]	WILKS D S. Sampling distributions of the Brier score and Brier skill score under serial dependence[J]. Quarterly Journal of the Royal Meteorological Society, 2010, 136(653):2109-2118.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献