人工智能对抗攻击研究综述

YI Ping; WANG Kedi; HUANG Cheng; GU Shuangchi; ZOU Futai; LI Jianhua

doi:10.16183/j.cnki.jsjtu.2018.10.019

上海交通大学学报 >

2018 , Vol. 52 >Issue 10: 1298 - 1306

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2018.10.019

学报（中文）

人工智能对抗攻击研究综述

YI Ping ,
WANG Kedi ,
HUANG Cheng ,
GU Shuangchi ,
ZOU Futai ,
LI Jianhua

展开

上海交通大学上海市信息安全综合管理技术研究重点实验室；电子信息与电气工程学院，上海 200240

易平（1969-），男，河南省洛阳市人，副教授，现主要从事人工智能安全研究.

网络出版日期: 2025-07-02

基金资助

国家自然科学基金(61571290, 61431008), 上海市临床技能与临床创新三年行动计划（16CR2042B）资助项目

收起

Adversarial Attacks in Artificial Intelligence： A Survey

易平，王科迪，黄程，顾双驰，邹福泰，李建华

Expand

Shanghai Key Laboratory of Integrated Administration Technologies for Information Security; School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Online published: 2025-07-02

Fold

摘要

随着人工智能的广泛应用，人工智能安全也开始引起人们的关注，其中人工智能对抗攻击已经成为人工智能安全研究热点.为此，介绍了对抗攻击的概念和产生对抗样本的原因，主要因为模型判断边界与真实系统边界的不一致导致对抗空间的存在；论述了几种经典生成对抗样本的方法，包括快速梯度和雅克比映射攻击，对抗攻击的主要思路是寻找模型梯度变化最快方向，按这个方向加入扰动从而导致模型误判；论述了检测对抗攻击的方法和对抗攻击的防御方法，并提出未来的一些研究方向.

关键词： 人工智能; 人工智能安全; 深度学习; 对抗攻击; 对抗学习

本文引用格式

YI Ping , WANG Kedi , HUANG Cheng , GU Shuangchi , ZOU Futai , LI Jianhua . 人工智能对抗攻击研究综述[J]. 上海交通大学学报, 2018 , 52(10) : 1298 -1306 . DOI: 10.16183/j.cnki.jsjtu.2018.10.019

Abstract

With the widespread use of artificial intelligence, artificial intelligence security has drawn public attention. The research on adversarial attacks in artificial intelligence has become a hotspot of artificial intelligence security. This paper first introduces the concept of adversarial attacks and the causes of adversarial attacks. The main reason is that the inconsistency between the model boundary and the real boundary leads to the existence of adversarial space. This paper review the works that design adversarial attacks, detect methods and defense methods agaisnt the attacks. The adversarial attacks including FGSM and JSMA attacks, the main idea of the attacks is to find the fast gradient direction of the model, adding perturbation according the direction and causing model misjudgment. Finally, some future research directions are proposed.

Key words： artificial intelligence; artificial intelligence security; deep learning; adversarial attack; adversarial learning

参考文献

［1］LECUN Y, BENGIO Y, HINTON G. Deep learning［J］. Nature, 2015, 521(7553): 436-444. ［2］GOODFELLOW I, YOSHUA B, AARON C. Deep learning［M］. Boston: MIT Press, 2016. ［3］WANG Xinggang, YANG Wei, JEFFREY W, et al. Searching for prostate cancer by fully automated magnetic resonance imaging classification: Deep learning versus non-deep learning［J］. Scientific Reports, 2017, 7(1): 15415. ［4］XIONG H Y, ALIPANAHI B. The human splicing code reveals new insights into the genetic determinants of disease［J］. Science, 2015, 347 (6218): 144-153. ［5］WEBB S. Deep learning for biology［J］. Nature, 2018, 554(2): 555-557. ［6］BRANSON K. A deep (learning) dive into a cell［J］. Nature Methods, 2018, 15(4): 253-254. ［7］DENG Yue, BAO Feng, KONG Youyong, et al. Deep direct reinforcement learning for financial signal representation and trading［J］. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 653-664. ［8］HE Ying, ZHAO Nan, YIN Hongxi. Integrated networking, caching, and computing for connected vehicles: A deep reinforcement learning approach［J］. IEEE Transactions on Vehicular Technology, 2018, 67(1): 44-55. ［9］ZHAO Dongbin, CHEN Yaran, LV Le. Deep reinforcement learning with visual attention for vehicle classification［J］. IEEE Transactions on Cognitive and Developmental Systems, 2017,9(4): 356-367. ［10］AKHTAR N, MIAN A. Threat of adversarial attacks on deep learning in computer vision: a survey［J］. IEEE Access, 2018, 6(2): 14410-14430. ［11］GOODFELLOW I, SHLENS J, CHRISTIAN S. Explaining and harnessing adversarial examples［EB/OL］. (2015-03-20)［2018-06-23］. https://arxiv.org/abs/1412.6572. ［12］GUO Chuan, RANA M, CISSE M, et al. Countering adversarial images using input transformations ［EB/OL］. (2018-01-25)［2018-06-23］. https://arxiv.org/abs/1711.00117. ［13］SINHA A, NAMKOONG H, DUCHI J. Certifying some distributional robustness with principled adversarial training ［EB/OL］. (2018-05-01)［2018-06-23］. https://arxiv.org/abs/1710.10571. ［14］SONG Yang, KIM T, NOWOZIN S, et al. Pixel defend: Leveraging generative models to understand and defend against adversarial examples ［EB/OL］. (2018-05-01)［2018-06-23］. https://arxiv.org/abs/1710.10766. ［15］XIE Cihang, WANG Jianyu, ZHANG Zhishuai, et al. Mitigating adversarial effects through randomization ［EB/OL］. (2018-02-28)［2018-06-23］. https://arxiv.org/abs/1711.01991. ［16］MCDANIEL P, PAPERNOT N, CELIK Z B. Machine learning in adversarial settings［J］. IEEE Security & Privacy, 2016, 14(3): 68-72. ［17］PAPERNOT N, MCDANIEL P, JHA S, et al. The limitations of deep learning in adversarial settings［C］//IEEE European Symposium on Security and Privacy (EuroS&P). Saarbrucken, Germany: IEEE, 2016: 372-387. ［18］KURAKIN A, GOODFELLOW I, BENGIO S. Adversarial examples in the physical world［EB/OL］. (2018-05-28) ［2018-06-23］. https://arxiv.org/abs/1805.10997. ［19］TRAMER F, GOODFELLOW I, BONEH D, et al. Ensemble adversarial training: attacks and defenses ［EB/OL］. (2017-05-19)［2018-06-23］. https://arxiv.org/abs/1705.07204. ［20］MOOSAVIDEZFOOLI S, FAWZI A, FROSSARD P. DeepFool: A simple and accurate method to fool deep neural networks［EB/OL］. (2015-11-14)［2018-06-23］. https://arxiv.org/abs/1511.04599. ［21］BRENDEL W, RAUBER J, BETHGE M. Decision-based adversarial attacks: Reliable attacks against blackbox machine learning models［EB/OL］. (2017-12-12)［2018-06-23］. https://arxiv.org/abs/1712.04248. ［22］CISSE M, ADI Y, NEVEROVA N, et al. Houdini: Fooling deep structured prediction models ［EB/OL］. (2017-07-17) ［2018-06-23］. https://arxiv.org/abs/1707.05373. ［23］HE W, LI Bo, SONG D. Decision boundary analysis of adversarial examples［EB/OL］. (2018-02-16)［2018-06-23］. https://openreview.net/forum?id=BkpiPMbA-. ［24］ZHAO Zhengli, DUA D, SINGH S. Generating natural adversarial examples［EB/OL］. (2017-10-31)［2018-06-23］. https://arxiv.org/abs/1710.11342. ［25］XIAO Chaowei, ZHU Junyan, LI Bo, et al. Spatially transformed adversarial examples［EB/OL］. (2018-01-08) ［2018-06-23］. https://arxiv.org/abs/1801.02612. ［26］CARLINI N, WAGNER D. Towards evaluating the robustness of neural networks［EB/OL］. (2016-08-16) ［2018-06-23］. https://arxiv.org/abs/1608.04644. ［27］PAPERNOT N, MCDANIEL P, GOODFELLOW I, et al. Practical black-box attacks against machine learning［EB/OL］. (2016-02-08)［2018-06-23］. https://arxiv.org/abs/1602.02697. ［28］PAPERNOT N, GOODFELLOW I, SHEATSLEY R, et al. Cleverhans v1. 0.0: An adversarial machine learning library［EB/OL］. (2016-10-03)［2018-06-23］. https://arxiv.org/abs/1610.00768. ［29］TANAY T, GRIFFIN L. A boundary tilting persepective on the phenomenon of adversarial examples［EB/OL］. (2016-08-27)［2018-06-23］. https://arxiv.org/abs/1608.07690. ［30］FAWZI A, FAWZI O, FROSSARD P. Fundamental limits on adversarial robustness［EB/OL］. (2015-04-27)［2018-06-23］. http://www.alhusseinfawzi.info/papers/workshop_dl.pdf. ［31］TABACOF P, VALLE E. Exploring the space of adversarial images［C］//IEEE International Joint Conference on Neural Networks (IJCNN). Vancouver, BC, Canada: IEEE, 2016: 2161-4407. ［32］LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition［J］. Neural Computation, 1989, 1(4): 541-551. ［33］DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database［C］//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami, USA: IEEE, 2009: 248-255. ［34］TRAMER F, PAPERNOT N, GOODFELLOW I, et al. The space of transferable adversarial examples［EB/OL］. (2017-04-11)［2018-06-23］. https://arxiv.org/abs/1704.03453. ［35］KROTOV D, HOPFIELD J J. Dense associative memory is robust to adversarial inputs［EB/OL］. (2016-08-27)［2018-06-23］. https://arxiv.org/abs/1701.00939. ［36］MOOSAVI-DEZFOOLI S M, FAWZI A, FAWZI O, et al. Universal adversarial perturbations［EB/OL］. (2016-10-26)［2018-06-23］. https://arxiv.org/abs/1610.08401. ［37］DZIUGAITE G K, GHAHRAMANI Z, ROY D M. A study of the effect of JPG compression on adversarial images［EB/OL］. (2016-08-02)［2018-06-23］. https://arxiv.org/abs/1608.00853. ［38］DAS N, SHANBHOGUE M, CHEN S, et al. Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression［EB/OL］. (2017-05-08)［2018-06-23］. https://arxiv.org/abs/1705.02900. ［39］SHIN R, SONG D. JPEG-resistant adversarial images［EB/OL］. (2017-08-14)［2018-06-23］. https://machine-learning-and-security.github.io/papers/mlsec17_paper_54.pdf. ［40］AKHTAR N, LIU Jian, MIAN A. Defense against universal adversarial perturbations［EB/OL］. (2017-11-16)［2018-06-23］. https://arxiv.org/abs/1711.05929. ［41］XIE Cihang, WANG Jianyu, ZHANG Zhishuai, et al. Adversarial examples for semantic segmentation and object detection［EB/OL］. (2017-05-24)［2018-06-23］. https://arxiv.org/abs/1703.08603. ［42］WANG Qinglong, GUO Wenbo, ZHANG Kaixuan, et al. Learning adversary-resistant deep neural networks［EB/OL］. (2016-12-05)［2018-06-23］. https://arxiv.org/abs/1612.01401. ［43］GU Shixiang, RIGAZIO L. Towards deep neural network architectures robust to adversarial examples［EB/OL］. (2014-12-11)［2018-06-23］. https://arxiv.org/abs/1412.5068. ［44］RIFAI S, VINCENT P, MULLER X, et al. Contractive auto-encoders: Explicit invariance during feature extraction［C］//ICML’11 Proceedings of the 28th International Conference on International Conference on Machine Learning. Washington, USA: Omnipress, 2011: 833-840. ［45］ROSS A, DOSHIVELEZ F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients［EB/OL］. (2017-11-26)［2018-06-23］. https://arxiv.org/abs/1711.09404. ［46］PAPERNOT N, MCDANIEL P, WU Xi, et al. Distillation as a defense to adversarial perturbations against deep neural networks［C］//IEEE Symposium on Security and Privacy (SP). San Jose, CA, USA: IEEE, 2016: 2375-1207. ［47］GAO Ji, WANG Beilun, LIU Zeming, et al. Masking deep neural network models for robustness against adversarial samples［EB/OL］. (2017-02-22)［2018-06-23］. https://arxiv.org/abs/1702.06763. ［48］LEE H, HAN S, LEE J. Generative adversarial trainer: defense to adversarial perturbations with GAN［EB/OL］. (2017-05-09)［2018-06-23］. https://arxiv.org/abs/1705.03387. ［49］MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks［EB/OL］. (2017-06-19)［2018-06-23］. https://arxiv.org/abs/1706.06083. ［50］MA Xingjun, LI Bo, WANG Yisen, et al. Characterizing adversarial subspaces using local intrinsic dimensionality［EB/OL］. (2018-01-08)［2018-06-23］. https://arxiv.org/abs/1801.02613. ［51］SAM P, KABKAB M, CHELLAPPA R. Defense-GAN: Protecting classifiers against adversarial attacks using generative models［EB/OL］. (2018-05-17)［2018-06-23］. https://arxiv.org/abs/1805.06605. ［52］RAGHUNATHAN A, STEINHARDT J, LIANG P. Certified defenses against adversarial examples［EB/OL］. (2018-01-29)［2018-06-23］. https://arxiv.org/abs/1801.09344. ［53］BUCKMAN J, ROY A, GOODFELLOW I, et al. Thermometer encoding: One hot way to resist adversarial examples［EB/OL］. (2018-02-16)［2018-06-23］. https://openreview.net/forum?id=S18Su--CW. ［54］WENG Xuwei, ZAHNG Huan, CHEN Pinyu, et al. Evaluating the robustness of neural networks: An extreme value theory approach［EB/OL］. (2018-01-31)［2018-06-23］. https://arxiv.org/abs/1801.10578. ［55］ELSAYED G F, PAPERNOT N, GOODFELLOW I, et al. Adversarial examples that fool both human and computer Vision［EB/OL］. (2018-02-22)［2018-06-23］. https://arxiv.org/abs/1802.08195.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献