融合FCN和LSTM的视频异常事件检测

武光利, 郭振洲, 李雷霆, 王成祥

doi:10.16183/j.cnki.jsjtu.2020.120

上海交通大学学报 >

2021 , Vol. 55 >Issue 5: 607 - 614

DOI: https://doi.org/10.16183/j.cnki.jsjtu.2020.120

融合FCN和LSTM的视频异常事件检测

展开

1.甘肃政法大学网络空间安全学院,兰州 730070
2.西北民族大学中国民族语言文字信息技术教育部重点实验室,兰州 730070

武光利(1981-),男,山东省潍坊市人,教授,现主要从事信息内容安全、人工智能等研究.电话(Tel.):0931-7601406;E-mail: 272956638@qq.com.

收稿日期: 2020-04-26

网络出版日期: 2021-06-01

基金资助

甘肃省自然科学基金(17JR5RA161);甘肃省青年科技基金计划(18JR3RA193);甘肃省高等学校科研项目(2017A-068)

收起

Video Abnormal Detection Combining FCN with LSTM

Expand

1. School of Cyber Security, Gansu University of Political Science and Law, Lanzhou 730070, China
2. Key Laboratory of China’s Ethnic Languages and Information Technology of the Ministry of Education, Northwest Minzu University, Lanzhou 730030, China;

Received date: 2020-04-26

Online published: 2021-06-01

Fold

摘要

针对传统视频异常检测模型的缺点,提出一种融合全卷积神经(FCN)网络和长短期记忆(LSTM)网络的网络结构.该网络结构可以进行像素级预测,并能精确定位异常区域.首先,利用卷积神经网络提取视频帧不同深度的图像特征;然后,把不同的图像特征分别输入记忆网络分析时间序列的语义信息,并通过残差结构融合图像特征和语义信息;同时,采用跳级结构集成多模态下的融合特征并进行上采样,最终获得与原视频帧大小相同的预测图.所提网络结构模型在加州大学圣地亚哥分校(UCSD)异常检测数据集的ped 2子集和明尼苏达大学(UMN)人群活动数据集上进行测试,均取得了较好的结果.在UCSD上的等错误率低至6.6%,曲线下面积达到了98.2%, F₁分数达到了94.96%;在UMN上的等错误率低至7.1%,曲线下面积达到了93.7%,F₁分数达到了94.46%.

关键词： 计算机视觉; 视频异常检测; 像素级预测; 全卷积神经网络; 长短期记忆网络

本文引用格式

武光利, 郭振洲, 李雷霆, 王成祥 . 融合FCN和LSTM的视频异常事件检测[J]. 上海交通大学学报, 2021 , 55(5) : 607 -614 . DOI: 10.16183/j.cnki.jsjtu.2020.120

Abstract

In view of the shortcomings of the traditional video anomaly detection model, a network structure combining the fully convolutional neural (FCN) network and the long short-term memory (LSTM)network is proposed. The network can perform pixel-level prediction and can accurately locate abnormal areas. The network first uses the convolutional neural network to extract image features of different depths in video frames. Then, different image features are input to memory network to analyze semantic information on time series. Image features and semantic information are fused through residual structure. At the same time, the skip structure is used to integrate the fusion features in multi-mode and upsampling is conducted to obtain a prediction image with the same size as the original video frame. The proposed model is tested on the ped 2 subset of University of California, San Diego (UCSD) anomaly detection dataset and University of Minnesota System(UMN)crowd activity dataset. And both two datasets achieve good results. On the UCSD dataset, the equal error rate is as low as 6.6%, the area under curve reaches 98.2%, and the F₁ score reaches 94.96%. On the UMN dataset, the equal error rate is as low as 7.1%, the area under curve reaches 93.7%, and the F₁ score reaches 94.46%.

Key words： computer vision; video abnormal detection; pixel-level prediction; full convolutional neural (FCN) network; long short-term memory (LSTM) network

参考文献

[1]	WU G L, LIU L P, ZHANG C, et al. Video abnormal event detection based on ELM [C]//2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP). Piscataway, NJ, USA: IEEE, 2019: 367-371.
[2]	闻辉, 贾冬顺, 严涛, 等. 智能视频异常检测事件研究分析[J]. 信息与电脑(理论版), 2019(12):49-50.
[2]	WEN Hui, JIA Dongshun, YAN Tao, et al. Research and analysis of intelligent video anomaly detection events[J]. China Computer & Communication, 2019(12):49-50.
[3]	胡正平, 张乐, 李淑芳, 等. 视频监控系统异常目标检测与定位综述[J]. 燕山大学学报, 2019, 43(1):1-12.
[3]	HU Zhengping, ZHANG (Le\|Yue), LI Shufang, et al. Review of abnormal behavior detection and location for intelligent video surveillance systems[J]. Journal of Yanshan University, 2019, 43(1):1-12.
[4]	MAHADEVAN V, LI W X, BHALODIA V, et al. Anomaly detection in crowded scenes [C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2010: 1975-1981.
[5]	周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017, 40(6):1229-1251.
[5]	ZHOU Feiyan, JIN Linpeng, DONG Jun. Review of convolutional neural network[J]. Chinese Journal of Computers, 2017, 40(6):1229-1251.
[6]	何传阳, 王平, 张晓华, 等. 基于智能监控的中小人群异常行为检测[J]. 计算机应用, 2016, 36(6):1724-1729.
[6]	HE Chuanyang, WANG Ping, ZHANG Xiaohua, et al. Abnormal behavior detection of small and medium crowd based on intelligent video surveillance[J]. Journal of Computer Applications, 2016, 36(6):1724-1729.
[7]	柳晶晶, 陶华伟, 罗琳, 等. 梯度直方图和光流特征融合的视频图像异常行为检测算法[J]. 信号处理, 2016, 32(1):1-7.
[7]	LIU Jingjing, TAO Huawei, LUO Lin, et al. Video anomaly detection algorithm combined with histogram of oriented gradients and optical flow[J]. Journal of Signal Processing, 2016, 32(1):1-7.
[8]	都桂英, 陈铭进. 基于智能视频分析的运动目标异常行为检测算法研究[J]. 电视技术, 2018, 42(12):23-26.
[8]	DU Guiying, CHEN Mingjin. Research on anomaly detection algorithm of moving objects based on intelligent video analysis[J]. Video Engineering, 2018, 42(12):23-26.
[9]	CHEN T, HOU C P, WANG Z P, et al. Anomaly detection in crowded scenes using motion energy model[J]. Multimedia Tools and Applications, 2018, 77(11):14137-14152.
[10]	LUO W X, LIU W, LIAN D Z, et al. Video anomaly detection with sparse coding inspired deep neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, PP(99):1.
[11]	雷丽莹, 陈华华. 基于AlexNet的视频异常检测技术[J]. 杭州电子科技大学学报(自然科学版), 2018, 38(6):16-21.
[11]	LEI Liying, CHEN Huahua. Video anomaly detection based on AlexNet[J]. Journal of Hangzhou Dianzi University (Natural Sciences), 2018, 38(6):16-21.
[12]	章琳, 袁非牛, 张文睿, 等. 全卷积神经网络研究综述[J]. 计算机工程与应用, 2020, 56(1):25-37.
[12]	ZHANG Lin, YUAN Feiniu, ZHANG Wenrui, et al. Review of fully convolutional neural network[J]. Computer Engineering and Applications, 2020, 56(1):25-37.
[13]	周培培, 丁庆海, 罗海波, 等. 视频监控中的人群异常行为检测与定位[J]. 光学学报, 2018, 38(8):97-105.
[13]	ZHOU Peipei, DING Qinghai, LUO Haibo, et al. Anomaly detection and location in crowded surveillance videos[J]. Acta Optica Sinica, 2018, 38(8):97-105.
[14]	WANG S, ZHU E, YIN J, et al. Video anomaly detection and localization by local motion based joint video representation and OCELM[J]. Neurocomputing, 2018, 277:161-175.
[15]	RAVANBAKHSH M, NABI M, SANGINETO E, et al. Abnormal event detection in videos using generative adversarial nets [C]//2017 IEEE International Conference on Image Processing (ICIP). Piscataway, NJ, USA: IEEE, 2017: 1577-1581.
[16]	SABOKROU M, FAYYAZ M, FATHY M, et al. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes[J]. Computer Vision and Image Understanding, 2018, 172:88-97.
[17]	FAN Y X, WEN G J, LI D R, et al. Video anomaly detection and localization via Gaussian mixture fully convolutional variational autoencoder[J]. Computer Vision and Image Understanding, 2020, 195:102920.
[18]	SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
[19]	HINAMI R, MEI T, SATOH S. Joint detection and recounting of abnormal events by learning deep generic knowledge [C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 3639-3647.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献