人工智能对抗攻击研究综述

doi:10.16183/j.cnki.jsjtu.2018.10.019

摘要/Abstract

摘要： 随着人工智能的广泛应用，人工智能安全也开始引起人们的关注，其中人工智能对抗攻击已经成为人工智能安全研究热点.为此，介绍了对抗攻击的概念和产生对抗样本的原因，主要因为模型判断边界与真实系统边界的不一致导致对抗空间的存在；论述了几种经典生成对抗样本的方法，包括快速梯度和雅克比映射攻击，对抗攻击的主要思路是寻找模型梯度变化最快方向，按这个方向加入扰动从而导致模型误判；论述了检测对抗攻击的方法和对抗攻击的防御方法，并提出未来的一些研究方向.

关键词: 人工智能, 人工智能安全, 深度学习, 对抗攻击, 对抗学习

Abstract: With the widespread use of artificial intelligence, artificial intelligence security has drawn public attention. The research on adversarial attacks in artificial intelligence has become a hotspot of artificial intelligence security. This paper first introduces the concept of adversarial attacks and the causes of adversarial attacks. The main reason is that the inconsistency between the model boundary and the real boundary leads to the existence of adversarial space. This paper review the works that design adversarial attacks, detect methods and defense methods agaisnt the attacks. The adversarial attacks including FGSM and JSMA attacks, the main idea of the attacks is to find the fast gradient direction of the model, adding perturbation according the direction and causing model misjudgment. Finally, some future research directions are proposed.

Key words: artificial intelligence, artificial intelligence security, deep learning, adversarial attack, adversarial learning

中图分类号:

TP 183

易平，王科迪，黄程，顾双驰，邹福泰，李建华. 人工智能对抗攻击研究综述[J]. 上海交通大学学报, 2018, 52(10): 1298-1306.

YI Ping,WANG Kedi,HUANG Cheng,GU Shuangchi,ZOU Futai,LI Jianhua. Adversarial Attacks in Artificial Intelligence： A Survey[J]. Journal of Shanghai Jiao Tong University, 2018, 52(10): 1298-1306.

参考文献

［1］LECUN Y, BENGIO Y, HINTON G. Deep learning［J］. Nature, 2015, 521(7553): 436-444. ［2］GOODFELLOW I, YOSHUA B, AARON C. Deep learning［M］. Boston: MIT Press, 2016. ［3］WANG Xinggang, YANG Wei, JEFFREY W, et al. Searching for prostate cancer by fully automated magnetic resonance imaging classification: Deep learning versus non-deep learning［J］. Scientific Reports, 2017, 7(1): 15415. ［4］XIONG H Y, ALIPANAHI B. The human splicing code reveals new insights into the genetic determinants of disease［J］. Science, 2015, 347 (6218): 144-153. ［5］WEBB S. Deep learning for biology［J］. Nature, 2018, 554(2): 555-557. ［6］BRANSON K. A deep (learning) dive into a cell［J］. Nature Methods, 2018, 15(4): 253-254. ［7］DENG Yue, BAO Feng, KONG Youyong, et al. Deep direct reinforcement learning for financial signal representation and trading［J］. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 653-664. ［8］HE Ying, ZHAO Nan, YIN Hongxi. Integrated networking, caching, and computing for connected vehicles: A deep reinforcement learning approach［J］. IEEE Transactions on Vehicular Technology, 2018, 67(1): 44-55. ［9］ZHAO Dongbin, CHEN Yaran, LV Le. Deep reinforcement learning with visual attention for vehicle classification［J］. IEEE Transactions on Cognitive and Developmental Systems, 2017,9(4): 356-367. ［10］AKHTAR N, MIAN A. Threat of adversarial attacks on deep learning in computer vision: a survey［J］. IEEE Access, 2018, 6(2): 14410-14430. ［11］GOODFELLOW I, SHLENS J, CHRISTIAN S. Explaining and harnessing adversarial examples［EB/OL］. (2015-03-20)［2018-06-23］. https://arxiv.org/abs/1412.6572. ［12］GUO Chuan, RANA M, CISSE M, et al. Countering adversarial images using input transformations ［EB/OL］. (2018-01-25)［2018-06-23］. https://arxiv.org/abs/1711.00117. ［13］SINHA A, NAMKOONG H, DUCHI J. Certifying some distributional robustness with principled adversarial training ［EB/OL］. (2018-05-01)［2018-06-23］. https://arxiv.org/abs/1710.10571. ［14］SONG Yang, KIM T, NOWOZIN S, et al. Pixel defend: Leveraging generative models to understand and defend against adversarial examples ［EB/OL］. (2018-05-01)［2018-06-23］. https://arxiv.org/abs/1710.10766. ［15］XIE Cihang, WANG Jianyu, ZHANG Zhishuai, et al. Mitigating adversarial effects through randomization ［EB/OL］. (2018-02-28)［2018-06-23］. https://arxiv.org/abs/1711.01991. ［16］MCDANIEL P, PAPERNOT N, CELIK Z B. Machine learning in adversarial settings［J］. IEEE Security & Privacy, 2016, 14(3): 68-72. ［17］PAPERNOT N, MCDANIEL P, JHA S, et al. The limitations of deep learning in adversarial settings［C］//IEEE European Symposium on Security and Privacy (EuroS&P). Saarbrucken, Germany: IEEE, 2016: 372-387. ［18］KURAKIN A, GOODFELLOW I, BENGIO S. Adversarial examples in the physical world［EB/OL］. (2018-05-28) ［2018-06-23］. https://arxiv.org/abs/1805.10997. ［19］TRAMER F, GOODFELLOW I, BONEH D, et al. Ensemble adversarial training: attacks and defenses ［EB/OL］. (2017-05-19)［2018-06-23］. https://arxiv.org/abs/1705.07204. ［20］MOOSAVIDEZFOOLI S, FAWZI A, FROSSARD P. DeepFool: A simple and accurate method to fool deep neural networks［EB/OL］. (2015-11-14)［2018-06-23］. https://arxiv.org/abs/1511.04599. ［21］BRENDEL W, RAUBER J, BETHGE M. Decision-based adversarial attacks: Reliable attacks against blackbox machine learning models［EB/OL］. (2017-12-12)［2018-06-23］. https://arxiv.org/abs/1712.04248. ［22］CISSE M, ADI Y, NEVEROVA N, et al. Houdini: Fooling deep structured prediction models ［EB/OL］. (2017-07-17) ［2018-06-23］. https://arxiv.org/abs/1707.05373. ［23］HE W, LI Bo, SONG D. Decision boundary analysis of adversarial examples［EB/OL］. (2018-02-16)［2018-06-23］. https://openreview.net/forum?id=BkpiPMbA-. ［24］ZHAO Zhengli, DUA D, SINGH S. Generating natural adversarial examples［EB/OL］. (2017-10-31)［2018-06-23］. https://arxiv.org/abs/1710.11342. ［25］XIAO Chaowei, ZHU Junyan, LI Bo, et al. Spatially transformed adversarial examples［EB/OL］. (2018-01-08) ［2018-06-23］. https://arxiv.org/abs/1801.02612. ［26］CARLINI N, WAGNER D. Towards evaluating the robustness of neural networks［EB/OL］. (2016-08-16) ［2018-06-23］. https://arxiv.org/abs/1608.04644. ［27］PAPERNOT N, MCDANIEL P, GOODFELLOW I, et al. Practical black-box attacks against machine learning［EB/OL］. (2016-02-08)［2018-06-23］. https://arxiv.org/abs/1602.02697. ［28］PAPERNOT N, GOODFELLOW I, SHEATSLEY R, et al. Cleverhans v1. 0.0: An adversarial machine learning library［EB/OL］. (2016-10-03)［2018-06-23］. https://arxiv.org/abs/1610.00768. ［29］TANAY T, GRIFFIN L. A boundary tilting persepective on the phenomenon of adversarial examples［EB/OL］. (2016-08-27)［2018-06-23］. https://arxiv.org/abs/1608.07690. ［30］FAWZI A, FAWZI O, FROSSARD P. Fundamental limits on adversarial robustness［EB/OL］. (2015-04-27)［2018-06-23］. http://www.alhusseinfawzi.info/papers/workshop_dl.pdf. ［31］TABACOF P, VALLE E. Exploring the space of adversarial images［C］//IEEE International Joint Conference on Neural Networks (IJCNN). Vancouver, BC, Canada: IEEE, 2016: 2161-4407. ［32］LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition［J］. Neural Computation, 1989, 1(4): 541-551. ［33］DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database［C］//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami, USA: IEEE, 2009: 248-255. ［34］TRAMER F, PAPERNOT N, GOODFELLOW I, et al. The space of transferable adversarial examples［EB/OL］. (2017-04-11)［2018-06-23］. https://arxiv.org/abs/1704.03453. ［35］KROTOV D, HOPFIELD J J. Dense associative memory is robust to adversarial inputs［EB/OL］. (2016-08-27)［2018-06-23］. https://arxiv.org/abs/1701.00939. ［36］MOOSAVI-DEZFOOLI S M, FAWZI A, FAWZI O, et al. Universal adversarial perturbations［EB/OL］. (2016-10-26)［2018-06-23］. https://arxiv.org/abs/1610.08401. ［37］DZIUGAITE G K, GHAHRAMANI Z, ROY D M. A study of the effect of JPG compression on adversarial images［EB/OL］. (2016-08-02)［2018-06-23］. https://arxiv.org/abs/1608.00853. ［38］DAS N, SHANBHOGUE M, CHEN S, et al. Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression［EB/OL］. (2017-05-08)［2018-06-23］. https://arxiv.org/abs/1705.02900. ［39］SHIN R, SONG D. JPEG-resistant adversarial images［EB/OL］. (2017-08-14)［2018-06-23］. https://machine-learning-and-security.github.io/papers/mlsec17_paper_54.pdf. ［40］AKHTAR N, LIU Jian, MIAN A. Defense against universal adversarial perturbations［EB/OL］. (2017-11-16)［2018-06-23］. https://arxiv.org/abs/1711.05929. ［41］XIE Cihang, WANG Jianyu, ZHANG Zhishuai, et al. Adversarial examples for semantic segmentation and object detection［EB/OL］. (2017-05-24)［2018-06-23］. https://arxiv.org/abs/1703.08603. ［42］WANG Qinglong, GUO Wenbo, ZHANG Kaixuan, et al. Learning adversary-resistant deep neural networks［EB/OL］. (2016-12-05)［2018-06-23］. https://arxiv.org/abs/1612.01401. ［43］GU Shixiang, RIGAZIO L. Towards deep neural network architectures robust to adversarial examples［EB/OL］. (2014-12-11)［2018-06-23］. https://arxiv.org/abs/1412.5068. ［44］RIFAI S, VINCENT P, MULLER X, et al. Contractive auto-encoders: Explicit invariance during feature extraction［C］//ICML’11 Proceedings of the 28th International Conference on International Conference on Machine Learning. Washington, USA: Omnipress, 2011: 833-840. ［45］ROSS A, DOSHIVELEZ F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients［EB/OL］. (2017-11-26)［2018-06-23］. https://arxiv.org/abs/1711.09404. ［46］PAPERNOT N, MCDANIEL P, WU Xi, et al. Distillation as a defense to adversarial perturbations against deep neural networks［C］//IEEE Symposium on Security and Privacy (SP). San Jose, CA, USA: IEEE, 2016: 2375-1207. ［47］GAO Ji, WANG Beilun, LIU Zeming, et al. Masking deep neural network models for robustness against adversarial samples［EB/OL］. (2017-02-22)［2018-06-23］. https://arxiv.org/abs/1702.06763. ［48］LEE H, HAN S, LEE J. Generative adversarial trainer: defense to adversarial perturbations with GAN［EB/OL］. (2017-05-09)［2018-06-23］. https://arxiv.org/abs/1705.03387. ［49］MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks［EB/OL］. (2017-06-19)［2018-06-23］. https://arxiv.org/abs/1706.06083. ［50］MA Xingjun, LI Bo, WANG Yisen, et al. Characterizing adversarial subspaces using local intrinsic dimensionality［EB/OL］. (2018-01-08)［2018-06-23］. https://arxiv.org/abs/1801.02613. ［51］SAM P, KABKAB M, CHELLAPPA R. Defense-GAN: Protecting classifiers against adversarial attacks using generative models［EB/OL］. (2018-05-17)［2018-06-23］. https://arxiv.org/abs/1805.06605. ［52］RAGHUNATHAN A, STEINHARDT J, LIANG P. Certified defenses against adversarial examples［EB/OL］. (2018-01-29)［2018-06-23］. https://arxiv.org/abs/1801.09344. ［53］BUCKMAN J, ROY A, GOODFELLOW I, et al. Thermometer encoding: One hot way to resist adversarial examples［EB/OL］. (2018-02-16)［2018-06-23］. https://openreview.net/forum?id=S18Su--CW. ［54］WENG Xuwei, ZAHNG Huan, CHEN Pinyu, et al. Evaluating the robustness of neural networks: An extreme value theory approach［EB/OL］. (2018-01-31)［2018-06-23］. https://arxiv.org/abs/1801.10578. ［55］ELSAYED G F, PAPERNOT N, GOODFELLOW I, et al. Adversarial examples that fool both human and computer Vision［EB/OL］. (2018-02-22)［2018-06-23］. https://arxiv.org/abs/1802.08195.

[1]	梁煜婉, 肖朝昀, 李明广, 孟江山, 周建烽, 黄山景, 朱浩杰. 基于长短时记忆的真空预压地基沉降预测[J]. 上海交通大学学报, 2025, 59(4): 525-532.
[2]	. 基于RGB-D图像的机器人抓取检测高效全卷积网络和优化方法[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 399-416.
[3]	. 基于双流自编码器的无监督动作识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336.
[4]	唐成, 梁一林, 李汶洁, 陈国铃. 宽带通信卫星态势感知网络安全研究[J]. 空天防御, 2025, 8(2): 136-141.
[5]	薛雅丽, 徐夏易, 李锦毅, 崔闪, 洪君, 刘世豪. 智能控制技术在导弹制导系统中的应用与发展前景[J]. 空天防御, 2025, 8(2): 1-6.
[6]	孙佳哲, 邹鹰. 基于深度学习的码头电子围栏识别应用[J]. 海洋工程装备与技术, 2025, 12(1): 87-93.
[7]	Sahaya Anselin Nisha1, NARMADHA R.1, AMIRTHALAKSHMI T. M.2, BALAMURUGAN V.1, VEDANARAYANAN V.1. LOBO优化的深度卷积神经网络用于脑肿瘤分类[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 107-114.
[8]	徐旺旺1,2，许良凤1,2，刘宁徽3，律娜3. 基于多注意力卷积神经网络的乳腺癌组织学图像诊断[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 91-106.
[9]	詹何庆1，韩贵来1，魏传安1，李治群2. 人工智能结合磁共振成像和计算建模在心脏电生理与临床诊断中的应用[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(1): 53-65.
[10]	王于波, 郝玲, 徐飞, 陈文彬, 郑利斌, 陈磊, 闵勇. 分布式光伏集群发电功率波动模式识别与超短期概率预测[J]. 上海交通大学学报, 2024, 58(9): 1334-1343.
[11]	李明爱1, 2, 魏丽娜1. 基于朴素卷积神经网络和线性插值的运动想像分类[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 958-966.
[12]	周毅, 周良才, 史迪, 赵小英, 闪鑫. 基于安全深度强化学习的电网有功频率协同优化控制[J]. 上海交通大学学报, 2024, 58(5): 682-692.
[13]	崔闪, 潘俊杨, 王伟, 郭叶, 许江涛. 基于深度学习的防空反导拦截决策研究[J]. 空天防御, 2024, 7(5): 54-64.
[14]	刘婧, 郭晓雷, 张欣海, 毛靖军, 吕瑞恒. 空面导弹轻量化空中斜框目标检测算法[J]. 空天防御, 2024, 7(4): 106-113.
[15]	唐胜景, 王太岩, 赵刚练, 郭杰, 李佳丽, 尹航. 面向目标跟踪的多传感器数据融合研究综述[J]. 空天防御, 2024, 7(4): 18-29.