融合MobileNetV3特征的结构化剪枝方法

doi:10.16183/j.cnki.jsjtu.2022.077

摘要/Abstract

摘要：

传统的深度神经网络由于计算量和内存占用庞大,难以部署到嵌入式平台中发挥实用价值,所以轻量级的深度神经网络得到快速发展.其中,谷歌提出的轻量级架构MobileNet具有广泛的应用.为了进一步提高性能,MobileNet的模型由MobileNetV1发展到MobileNetV3,但模型变得更为复杂,导致其规模不断扩大,难以发挥轻量级模型的优势.为了在能保持MobileNetV3性能的前提下,降低部署于嵌入式平台的难度,提出一种融合MobileNetV3特征的结构化剪枝方法,对MobileNetV3-Large模型进行裁剪,得到一个更加紧凑的模型.首先对模型进行稀疏正则化训练,得到一个较为稀疏的网络模型;然后使用卷积层的稀疏值和批量归一化层的缩放系数的乘积判别冗余滤波器对其进行结构化剪枝,并在CIFAR-10和CIFAR-100数据集上进行实验.实验结果表明:提出的压缩方法可以有效压缩模型参数,并且压缩后模型仍然能保证良好性能;在准确率不变的前提下,CIFAR-10上模型的参数量减少44.5%,且计算量减少40%.

关键词: 深度神经网络, 轻量级模型, 结构化剪枝, MobileNetV3

Abstract:

Due to its huge amount of calculation and memory occupation, the traditional deep neural network is difficult to be deployed to embedded platform. Therefore, lightweight models have been developing rapidly. Among them, the lightweight architecture MobileNet proposed by Google has been widely used. To improve the performance, the model of MobileNet has developed from MobileNetV1 to MobileNetV3. However, the model has become more complex and its scale continues to expand, which is difficult to give full play to the advantages of lightweight model. To reduce the difficulty of deploying MobileNetV3 on embedded platform while maintaining its performance, a structured pruning method integrating the characteristics of MobileNetV3 is proposed to prune the lightweight model MobileNetV3-Large to obtain a more compact lightweight model. First, the model is trained by sparse regularization to obtain a sparse network model. Then, the product of the sparse value of convolution layer and scale factor of batch normalization layer is used to identify the redundant filter, which is structurally pruned, and experiment is conducted on CIFAR-10 and CIFAR-100 datasets. The results show that the proposed compression method can effectively compress the model parameters, and the compressed model can still ensure a good performance. While the accuracy remains unchanged, the number of parameters on CIFAR-10 in the model is reduced by 44.5% and calculation amount is reduced by 40%.

Key words: deep neural network, lightweight model, structured pruning, MobileNetV3

中图分类号:

TP181

刘宇, 雷雪梅. 融合MobileNetV3特征的结构化剪枝方法[J]. 上海交通大学学报, 2023, 57(9): 1203-1213.

LIU Yu, LEI Xuemei. A Structured Pruning Method Integrating Characteristics of MobileNetV3[J]. Journal of Shanghai Jiao Tong University, 2023, 57(9): 1203-1213.

图/表 14

图1

图2

图3

表1

图4

图5

图6

图7

表2

表3

表4

图8

表5

表6

参考文献 32

[1]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386 URL
[2]	李洋洋, 史历程, 万卫兵, 等. 基于卷积神经网络的三维物体检测方法[J]. 上海交通大学学报, 2018, 52(1): 7-12.
	LI Yangyang, SHI Licheng, WAN Weibing, et al. A convolutional neural network-based method for 3D object detection[J]. Journal of Shanghai Jiao Tong University, 2018, 52(1): 7-12.
[3]	KANG J, TARIQ S, OH H, et al. A survey of deep learning-based object detection methods and datasets for overhead imagery[J]. IEEE Access, 2022, 10: 20118-20134. doi: 10.1109/ACCESS.2022.3149052 URL
[4]	张峻宁, 苏群星, 王成, 等. 一种改进变换网络的域自适应语义分割网络[J]. 上海交通大学学报, 2021, 55(9): 1158-1168.
	ZHANG Junning, SU Qunxing, WANG Cheng, et al. A domain adaptive semantic segmentation network based on improved transformation network[J]. Journal of Shanghai Jiao Tong University, 2021, 55(9): 1158-1168.
[5]	LI X, YANG Y B, ZHAO Q J, et al. Spatial pyramid based graph reasoning for semantic segmentation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 8947-8956.
[6]	高晗, 田育龙, 许封元, 等. 深度学习模型压缩与加速综述[J]. 软件学报, 2021, 32(1): 68-92.
	GAO Han, TIAN Yulong, XU Fengyuan, et al. Survey of deep learning model compression and acceleration[J]. Journal of Software, 2021, 32(1): 68-92.
[7]	耿丽丽, 牛保宁. 深度神经网络模型压缩综述[J]. 计算机科学与探索, 2020, 14(9): 1441-1455. doi: 10.3778/j.issn.1673-9418.2003056
	GENG Lili, NIU Baoning. Survey of deep neural networks model compression[J]. Journal of Frontiers of Computer Science & Technology, 2020, 14(9): 1441-1455.
[8]	WU J, WANG Y, WU Z, et al. Deep k-means: Retraining and parameter sharing with harder cluster assignments for compressing deep convolutions[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018: 5363-5372.
[9]	AGGARWAL V, WANG W L, ERIKSSON B, et al. Wide compression: Tensor ring nets[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 9329-9338.
[10]	CHEN H T, GUO T Y, XU C, et al. Learning student networks in the wild[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 6424-6433.
[11]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2022-03-18]. https://arxiv.org/abs/1704.04861.
[12]	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 4510-4520.
[13]	HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2019: 1314-1324.
[14]	CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 1800-1807.
[15]	KIM E, AHN C, OH S. NestedNet: Learning nested sparse structures in Deep Neural Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 8669-8678.
[16]	LI Y S, CHEN Y P, DAI X Y, et al. MicroNet: Improving image recognition with extremely low FLOPs[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 458-467.
[17]	YANN L C, DENKER J S, SOLLA S A. 1990. Optimal brain damage[J]. Neural Information Proceeding Systems. 1989, 2(279): 598-605.
[18]	HASSIBI B, STORK D G, WOLFF G J. Optimal Brain Surgeon and general network pruning[C]//IEEE International Conference on Neural Networks. San Francisco, USA: IEEE, 1993: 293-299.
[19]	HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015: 1135-1143.
[20]	CHEN W L, WILSON J T, TYREE S, et al. Compressing neural networks with the hashing trick[EB/OL]. (2015-04-19)[2022-03-18]. https://arxiv.org/abs/1504.04788.
[21]	LI H, ASIM K, IGOR D, et al. Pruning filters for efficient convNets[EB/OL]. (2017-05-10) [2022-03-18], https://arxiv.org/abs/1608.08710.
[22]	CHEN Y H, EMER J, SZE V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks[C]//2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Seoul, Korea: IEEE, 2016: 367-379.
[23]	LIU Z, LI J, SHEN Z, et al. Learning efficient convolutional networks through network slimming[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2755-2763.
[24]	韦越, 陈世超, 朱凤华, 等. 基于稀疏正则化的卷积神经网络模型剪枝方法[J]. 计算机工程, 2021, 47(10): 61-66. doi: 10.19678/j.issn.1000-3428.0059375
	WEI Yue, CHEN Shichao, ZHU Fenghua, et al. Pruning method for convolutional neural network models based on sparse regularization[J]. Computer Engineering, 2021, 47(10): 61-66. doi: 10.19678/j.issn.1000-3428.0059375
[25]	卢海伟, 夏海峰, 袁晓彤. 基于滤波器注意力机制与特征缩放系数的动态网络剪枝[J]. 小型微型计算机系统, 2019, 40(9): 1832-1838.
	LU Haiwei, XIA Haifeng, YUAN Xiaotong. Dynamic network pruning via filter attention mechanism and feature scaling factor[J]. Journal of Chinese Computer Systems, 2019, 40(9): 1832-1838.
[26]	LIU C T, LIN T W, WU Y H, et al. Computation-performance optimization of convolutional neural networks with redundant filter removal[J]. IEEE Transactions on Circuits & Systems, 2019, 66(5): 1908-1921.
[27]	IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR: W&CP, 2015: 448-456.
[28]	KULKARNI U, MEENA S M, GURLAHOSUR S V, et al. Quantization friendly MobileNet (QF-MobileNet) architecture for vision based applications on embedded platforms[J]. Neural Networks: The Official Journal of the International Neural Network Society, 2021, 136: 28-39. doi: 10.1016/j.neunet.2020.12.022 URL
[29]	叶会娟, 刘向阳. 基于稀疏卷积核的卷积神经网络研究及其应用[J]. 信息技术, 2017, 41(10): 5-9.
	YE Huijuan, LIU Xiangyang. Research and application of convolutional neural network based on sparse convolution kernel[J]. Information Technology, 2017, 41(10): 5-9.
[30]	WU S L, ZHANG F R, CHEN H D, et al. Semantic understanding based on multi-feature kernel sparse representation and decision rules for mangrove growth[J]. Information Processing & Management, 2022, 59(2): 102813. doi: 10.1016/j.ipm.2021.102813 URL
[31]	MERINO P. A difference-of-convex functions approach for sparse PDE optimal control problems with nonconvex costs[J]. Computational Optimization & Applications, 2019, 74(1): 225-258.
[32]	GAO X R, BAI Y Q, LI Q. A sparse optimization problem with hybrid L2-Lp regularization for application of magnetic resonance brain images[J]. Journal of Combinatorial Optimization, 2021, 42(4): 760-784. doi: 10.1007/s10878-019-00479-x

编辑推荐 0

Metrics

阅读次数

全文

650

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	28	67	0	555

来源	本网站	其他网站

次数	329	321
比例	51%	49%

摘要

847

最新录用	在线预览	正式出版

133	0	714

来源	本网站	其他网站

次数	351	496
比例	41%	59%

类型	参数占用量/%	计算量/%
网络单元	66.84	94.14
其他	33.16	5.87

剪枝率/%	m	γ	S_l(n)
10	2.306 4×10^-12	4.003 9×10^-12~4.922 5×10^-10	0~0.651 8
20	1.072 1×10^-11	8.369 6×10^-17~5.289 2×10^-10	0.006 3~0.705 4
30	2.904 7×10^-11	1.814 9×10^-11~4.982 8×10^-10	0.035 7~0.681 2
40	7.888 8×10^-11	4.654 0×10^-11~5.280 5×10^-10	0.062 5~0.830 4
50	0.070 0	1.188 0×10^-10~2.804 1×10^-01	0.071 4~0.794 6
60	0.146 6	0.087 5~0.583 8	0.187 5~0.830 4

剪枝率/%	准确率/%	参数量×10^-6	参数减少量/%	计算量	计算减少量/%
0	88.28	4.22	0	2.30×10⁸	0
10	87.86	3.68	12.8	2.17×10⁸	5.7
20	88.26	3.26	22.7	2.00×10⁸	13.0
30	88.23	2.89	31.5	1.81×10⁸	21.3
40	88.69	2.61	38.2	1.61×10⁸	30.0
50	88.55	2.34	44.5	1.38×10⁸	40.0
60	87.99	2.16	48.8	9.87×10⁷	57.1

剪枝方法	准确率/%	参数量/M	参数减少量/%	计算量	计算减少量
正常训练	88.28	4.22	0	2.30×10⁸	0
稀疏训练(λ=10⁵)	88.15	4.22	0	2.30×10⁸	0
稀疏性公式^[26]	88.12	2.44	42.2	1.18×10⁸	48.7
γ^[23]	88.37	2.34	44.5	1.39×10⁸	39.6
L1norm+γ^[24]	88.36	2.33	44.8	1.40×10⁸	39.1
稀疏性公式+γ,本文提出	88.55	2.34	44.5	1.18×10⁸	40.0

输入尺寸	模块	模块中通道数	模块中通道数(剪枝后)	输出通道数	激励模块	激活函数	步长
224×224×3	conv2d	—	—	16	—	HS	2
112×112×16	bneck, 3×3	16	9	16	—	RE	1
112×112×16	bneck, 3×3	64	49	24	—	RE	2
56×56×24	bneck, 3×3	72	42	24	—	RE	1
56×56×24	bneck, 5×5	72	72	40	√	RE	2
28×28×40	bneck, 5×5	120	102	40	√	RE	1
28×28×40	bneck, 5×5	120	89	40	√	RE	1
28×28×40	bneck, 3×3	240	223	80	—	HS	2
14×14×80	bneck, 3×3	200	144	80	—	HS	1
14×14×80	bneck, 3×3	184	139	80	—	HS	1
14×14×80	bneck, 3×3	184	112	80	—	HS	1
14×14×80	bneck, 3×3	480	209	112	√	HS	1
14×14×112	bneck, 3×3	672	38	112	√	HS	1
14×14×112	bneck, 5×5	672	540	160	√	HS	2
7×7×160	bneck, 5×5	960	484	160	√	HS	1
7×7×160	bneck, 5×5	960	255	160	√	HS	1
7×7×160	conv2d, 1×1	—	—	960	—	HS	1
7×7×960	pool, 7×7	—	—	—	—	—	1
1×1×960	conv2d, 1×1, NBN	—	—	1 280	—	HS	1
1×1×1280	conv2d, 1×1, NBN	—	—	q	—	—	1