上海交通大学学报(自然版)

• 自动化技术、计算机技术 • 上一篇    下一篇

一种基于音频词袋的暴力视频分类方法

李荣杰1,蒋兴浩1,2,孙锬锋1,2
  

  1. (1. 上海交通大学 信息安全工程学院, 上海 200240; 2. 上海市信息安全综合管理技术研究重点实验室, 上海 200240)
  • 收稿日期:2010-06-13 修回日期:1900-01-01 出版日期:2011-02-28 发布日期:2011-02-28

Violent Videos Classification Algorithm Based on Bag of Audio Words

LI Rongjie1,JIANG Xinghao1,2,SUN Tanfeng1,2
  

  1. (1. School of Information Engineering Security, Shanghai Jiaotong University, Shanghai 200240, China; 2. Shanghai Information Security Management and Technology Research Key Lab, Shanghai 200240, China)
  • Received:2010-06-13 Revised:1900-01-01 Online:2011-02-28 Published:2011-02-28

摘要: 针对网络视频的监管需求,提出了一种基于音频词袋的暴力视频分类方法.采用提取视频中音频流的多媒体内容描述接口(MPEG7)音频特征(包括音频频谱质心,音频频谱带宽等低层音频特征.)及MPEG7高层特征——音频签名,来构造每段视频特有的音频词汇,采用该音频词汇出现的频率形成音频词袋特征.采用支持向量机对暴力和非暴力视频进行分类.把词袋模型应用到暴力音频特征分类中,对于不同音频词汇量采用了独特的词汇权重分配机制,同时借助特有的针对暴力视频的分类策略,以提高分类效果.通过3组实验,对不同的音频特征的准确率、不同词汇的分类效果、以及对视觉特征粗分类的精确分类进行了研究.实验结果表明,该方法有较好的查全率.

关键词: 暴力视频, 音频词袋, 权重机制, 支持向量机

Abstract: A new method to classify the violent videos by the bag of audio words was introduced. The MPEG7 audio descriptors are firstly extracted, including the low level features such as AudioSpectrumCentroid and AudioSpectrumSpread etc. After that, the audio words are built through the MPEG7 high level descriptor, the AudioSighnature, which is considered as the fingerprint of the audio stream. The support vector machine is used to classify the feature vectors into two genres, which are the violent and nonviolent. There are three experiments in this paper: the research on the different types of the audio words, the different size of words and the classification of the shots detected from the visual features. It is demonstrated from the experiment result that the proposed method achieves good recall accuracy.

中图分类号: