Judging the Normativity of PAF Based on TFN and NAN

Expand
  • (College of Mechanical Engineering, Donghua University, Shanghai 201600, China)

Online published: 2020-09-11

Abstract

The normativity of workers’ actions during producing has a great impact on the quality of the products
and the safety of the operation process. Previous studies mainly focused on the normativity of each single producing
action instead of considering the normativity of continuous producing actions, which is defined as producing action
flow (PAF) in this paper, during operation process. For this issue, a normativity judging method based on two-
LSTM fusion network (TFN) and normativity-aware attention network (NAN) is proposed. First, TFN is designed
to detect and recognize the producing actions based on skeleton sequences of a worker during complete operation
process, and PAF data in sequential form are obtained. Then, NAN is built to allocate different levels of attention
to each producing action within the sequence of PAF, and by this means, an efficient normativity judging is
conducted. The combustor surface cleaning (CSC) process of rocket engine is taken as the experimental case, and
the CSC-Action2D dataset is established for evaluation. Experiment results show the high performance of TFN
and NAN, demonstrating the effectiveness of the proposed method for PAF normativity judging.

Cite this article

LI Zhiqiang, BAO Jinsong, LIU Tianyuan, WANG Jiacheng . Judging the Normativity of PAF Based on TFN and NAN[J]. Journal of Shanghai Jiaotong University(Science), 2020 , 25(5) : 569 -577 . DOI: 10.1007/s12204-020-2177-0

References

[1] RUDE D J, ADAMS S, BELING P A. Task recognition from joint tracking data in an operational manufacturing cell [J]. Journal of Intelligent Manufacturing, 2018,29(6): 1203-1217.
[2] HUANG H. Research on real-time monitoring system of sports fatigue [D]. Sichuan: University of Electronic Science and Technology of China, 2018 (in Chinese).
[3] WANG P, LIU H Y, WANG L H, et al. Deep learning-based human motion recognition for predictive context-aware human-robot collaboration [J].CIRP Annals, 2018, 67(1): 17-20.
[4] FOGGIA P, PERCANNELLA G, SAGGESE A, et al. Recognizing human actions by a bag of visual words [C]//IEEE International Conference on Systems,Man, and Cybernetics. Manchester, UK: IEEE,2013: 2910-2915.
[5] WANG H, KL¨ASER A, SCHMID C, et al. Action recognition by dense trajectories [C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Colorado Springs, CO, USA: IEEE, 2011:3169-3176.
[6] WANG H, SCHMID C. Action recognition with improved trajectories [C]//IEEE International Conference on Computer Vision. Sydney, NSW,Australia: IEEE, 2013: 3551-3558.
[7] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[M]//GHAHRAMANI Z, WELLING M, CORTES C,et al. Advances in Neural Information Processing Systems 27, 2014, 1: 568-576.
[8] JI S W, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2013, 35(1): 221-231.
[9] LI H J, TANG J H, WU S, et al. Automatic detection and analysis of player action in moving background sports video sequences [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2010, 20(3):351-364.
[10] TSAI DM, CHIUWY. Action performance evaluation in video sequences [J]. Imaging Science Journal, 2014,62(7): 358-364.
[11] JIANG Y F. Research on action evaluation method based on Kinect [D]. Shenyang: Shenyang University of Technology, 2017 (in Chinese).
[12] CHEN X M. An action evaluating system based on 3D human posture [D]. Hangzhou: Zhejiang University,2018 (in Chinese).
[13] SHARAF A, TORKI M, HUSSEIN M E, et al. Realtime multi-scale action detection from 3D skeleton data [C]//IEEE Winter Conference on Applications of Computer Vision. Waikoloa, HI, USA: IEEE, 2015:998-1005.
[14] HOAI M, DE LA TORRE F. Max-margin early event detectors [J].International Journal of Computer Vision,2014, 107(2): 191-202.
[15] ESCORCIA V, HEILBRON F C, NIEBLES J C, et al. DAPs: Deep action proposals for action understanding[M]//Computer Vision – ECCV 2016. Cham:Springer, 2016: 768-784.
[16] SHOU Z, WANG D, CHANG S F. Temporal action localization in untrimmed videos via multi-stage CNNs[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016:1049-1058.
[17] LI Y H, LAN C L, XING J L, et al. Online human action detection using joint classification-regression recurrent neural networks [M]//Computer Vision –ECCV 2016. Cham: Springer, 2016: 203-220.
[18] LIU J, LI Y, SONG S, et al. Multi-modality multi-task recurrent neural network for online action detection [J].IEEE Transactions on Circuits and Systems for Video Technology, 2019: 29(9): 2667-2682.
[19] SONG S, LAN C, XING J, et al. An end-to-end spatiotemporal attention model for human action recognition from skeleton data [C]//31st AAAI Conference on Artificial Intelligence. San Francisco, CA, USA: IEEE,2017: 4263-4270.
[20] SHOTTON J, SHARP T, KIPMAN A, et al. Realtime human pose recognition in parts from single depth images [J]. Communications of the ACM, 2013, 56(1):116-124.
[21] FANGHS, XIES Q, TAIYW, et al. RMPE: Regional multi-person pose estimation [C]/IEEE International Conference on Computer Vision. Venice, Italy: IEEE,2017: 2353-2362.
[22] CAO Z, SIMON T, WEI S, et al. Realtime multiperson 2D pose estimation using part affinity fields[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017:1302-1310.
[23] DE GEEST R, TUYTELAARS T. Modeling temporal structure with LSTM for online action detection[C]//IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, NV, USA: IEEE, 2018:1549-1557.
[24] CHEN J, J ¨ONSSON P, TAMURA M, et al. A simple method for reconstructing a high-quality NDVI timeseries data set based on the Savitzky-Golay filter [J].Remote Sensing of Environment, 2004, 91(3/4): 332-344.
[25] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation [EB/OL]. (2014-09-03). https://arxiv.org/abs/1406.1078.
[26] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification [C]//15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, CA, USA: ACL, 2016: 1480-1489.
[27] FENG S, WANG Y, LIU L, et al. Attention based hierarchical LSTM network for context-aware microblog sentiment classification [J]. World Wide Web, 2019,22(1): 59-81.
[28] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [EB/OL]. (2016-05-19). https://arxiv.org/abs/1409.0473.
[29] GAL Y, GHAHRAMANI Z. A theoretically grounded application of dropout in recurrent neural networks[C]//30th International Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS,1027-1035.
[30] ZHU W, LAN C, XING J, et al. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks [C]//13th AAAI Conference on Artificial Intelligence. Phoenix, AZ,USA: AAAI, 2016: 3697-3703.
[31] SUN H, CHEN J X, WECHSLER H, et al. A new segmentation method for broadcast sports video [C]//IEEE 17th International Conference on Computational Science and Engineering. Chengdu, China:IEEE, 2014: 1789-1793.
[32] XIA L, CHEN C C, AGGARWAL J K. View invariant human action recognition using histograms of 3D joints [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.Providence, RI, USA: IEEE, 2012: 20-27.

Outlines

/