J Shanghai Jiaotong Univ Sci ›› 2023, Vol. 28 ›› Issue (1): 20-27.doi: 10.1007/s12204-023-2565-3
收稿日期:
2022-02-28
出版日期:
2023-01-28
发布日期:
2023-02-10
FU Jiawei∗ (傅家威), ZHAO Xu (赵 旭)
Received:
2022-02-28
Online:
2023-01-28
Published:
2023-02-10
摘要: 准确的行人轨迹预测在自动驾驶系统中至关重要,因为它们对于自主车辆的响应和决策至关重要。在本研究中,我们关注从第一人称视角预测行人未来轨迹的问题。大多数现有的第一人称视角的轨迹预测方法采用了鸟瞰图下的预测方法,忽略了两者之间的差异。为此,我们澄清了两种视角之间的差异,并强调了第一人称视角中动作感知对于轨迹预测的重要性。我们提出了一种基于编码器–解码器框架的新动作感知网络,在编码器末端具有动作预测分支和目标估计分支。在解码器部分,采用双向长短期记忆块来生成行人未来轨迹的最终预测。与其他方法相比,我们的方法在公共数据集上进行了评估,并取得了有竞争力的表现。消融研究证明了动作预测分支的有效性。
中图分类号:
. 行人轨迹预测的动作感知编码器–解码器网络[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 20-27.
FU Jiawei∗ (傅家威), ZHAO Xu (赵 旭). Action-aware Encoder-Decoder Network for Pedestrian Trajectory Prediction[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 20-27.
[1] MALLA S, DARIUSH B, CHOI C. TITAN: future forecast using action priors [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA: IEEE, 2020: 11183-11193. [2] ZHANG T L, TU H Z, QIU W. Developing highprecision maps for automated driving in China: Legal obstacles and the way to overcome them [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(5): 658-669. [3] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The KITTI dataset [J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237. [4] SONG X B, WANG P, ZHOU D F, et al. Apollo-Car3D: A large 3D car instance understanding benchmark for autonomous driving [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 5447-5457. [5] HU Y K, WANG C X, YANG M. Decision-making method of intelligent vehicles: A survey [J]. Journal of Shanghai Jiao Tong University, 2021, 55(8): 1035-1048 (in Chinese). [6] SHI Q, ZHANG J L, YANG M. Curvature adaptive control based path following for automatic driving vehicles in private area [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(5): 690-698. [7] RASOULI A, KOTSERUBA I, KUNIC T, et al. PIE: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6261-6270. [8] RASOULI A, KOTSERUBA I, TSOTSOS J K. Are they going to cross? A benchmark dataset and baseline for pedestrian crosswalk behavior [C]//2017 IEEE International Conference on Computer Vision Workshops. Venice: IEEE, 2017: 206-213. [9] PELLEGRINI S, ESS A, SCHINDLER K, et al. You’ll never walk alone: Modeling social behavior for multitarget tracking [C]//2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 261-268. [10] LEAL-TAIX′E L, FENZI M, KUZNETSOVA A, et al. Learning an image-based motion context for multiple people tracking [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014: 3542-3549. [11] ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 961-971. [12] LIANG J W, JIANG L, NIEBLES J C, et al. Peeking into the future: Predicting future person activities and locations in videos [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 5718-5727. [13] SIVARAMAN S, TRIVEDI M M. Dynamic probabilistic drivability maps for lane change and merge driver assistance [J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(5): 2063-2073. [14] LI N, YAO Y, KOLMANOVSKY I, et al. Gametheoretic modeling of multi-vehicle interactions at uncontrolled intersections [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 1428-1442. [15] YAO Y, ATKINS E, JOHNSON-ROBERSON M, et al. BiTraP: Bi-directional pedestrian trajectory prediction with multi-modal goal estimation [J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1463-1470. [16] WANG C H, WANG Y C, XU M Z, et al. Stepwise goal-driven networks for trajectory prediction [J]. IEEE Robotics and Automation Letters, 2022, 7(2): 2716-2723. [17] MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: Endpoint conditioned trajectory prediction [M]//Computer Vision – ECCV 2020. Cham: Springer, 2020: 759-776. [18] REHDER E, KLOEDEN H. Goal-directed pedestrian prediction [C]//2015 IEEE International Conference on Computer Vision Workshop. Santiago: IEEE, 2015: 139-147. [19] RHINEHART N, MCALLISTER R, KITANI K, et al. PRECOG: Prediction conditioned on goals in visual multi-agent settings [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2821-2830. [20] HOCHREITER S, SCHMIDHUBER J. Long shortterm memory [J]. Neural Computation, 1997, 9(8): 1735-1780. [21] GUPTA A, JOHNSON J, LI F F, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 2255-2264. [22] KOSARAJU V, SADEGHIAN A, MART′IN-MART′IN R, et al. Social-BiGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks [C]//Advances in Neural Information Processing Systems. Vancouver, BC: Neural Information Processing Systems Foundation, 2019: 137-146. [23] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [C]//Advancesin Neural Information Processing Systems. Montreal: Neural Information Processing Systems Foundation, 2014: 2672-2680. [24] SHAFIEE N, PADIR T, ELHAMIFAR E. Introvert: Human trajectory prediction via conditional 3D attention [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 16810-16820. [25] DU L, DING X, LIU T, et al. Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder [C]//2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019: 2682-2691. [26] ZHAO T C, ZHAO R, ESKENAZI M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders [C]//55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017: 654-664. [27] SOHN K, LEE H, YAN X. Learning structured output representation using deep conditional generative models [C]//Advances in Neural Information Processing Systems. Montr′eal: Neural Information Processing Systems Foundation, 2015: 3483-3491. [28] REYNOLDS D. Gaussian mixture models [M]//Encyclopedia of biometrics. Boston, MA: Springer, 2009: 659-663. [29] QUAN R J, ZHU L C, WU Y, et al. Holistic LSTM for pedestrian trajectory prediction [J]. IEEE Transactions on Image Processing, 2021, 30: 3229-3239. [30] NEUMANN L, VEDALDI A. Pedestrian and egovehicle trajectory prediction from monocular camera [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 10199-10207. [31] RHINEHART N, KITANI K M, VERNAZA P. R2P2: A reparameterized pushforward policy for diverse, precise generative path forecasting [M]//Computer vision – ECCV 2018. Cham: Springer, 2018: 794-811. [32] LI J C, MA H B, TOMIZUKA M. Conditional generative neural system for probabilistic trajectory prediction [C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macao: IEEE, 2019: 6150-6156. [33] CHOI C, MALLA S, PATIL A, et al. DROGON: A causal reasoning framework for future trajectory forecast [EB/OL]. (2020-11-06) [2022-04-19]. https://arxiv.org/abs/1908.00024. [34] DEO N, TRIVEDI M M. Trajectory forecasts in unknown environments conditioned on gridbased plans [EB/OL]. (2021-04-29) [2022-04-19]. https://arxiv.org/abs/2001.00735. [35] FANG Z J, L′OPEZ A M. Is the pedestrian going to cross? Answering by 2D pose estimation [C]//2018 IEEE Intelligent Vehicles Symposium. Changshu: IEEE, 2018: 1271-1276. [36] CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 1302-1310. |
[1] | . 血管介入手术路径规划及三维视觉导航[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 472-481. |
[2] | . 基于改进FCOS算法的钢丝绳芯输送带损伤X射线图像检测[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 309-318. |
[3] | . 基于双流自编码器的无监督动作识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336. |
[4] | . 基于空间特征学习与多粒度特征融合的行人重识别[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 363-374. |
[5] | 周苏, 钟泽滨. 基于车载智能手机的实时车辆及行人测距[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1081-1090. |
[6] | 鄢丛强1,2, 郭正玉3,4, 蔡云泽 1,2. 基于改进CycleGAN的SAR图像舰船尾迹数据增强[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 702-711. |
[7] | LONARE Savita1,2, BHRAMARAMBA Ravi2. 基于图卷积网络的联邦式隐私保护交通预测方法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 509-517. |
[8] | 吕峰,王新彦,李磊,江泉,易政洋. 基于嵌入式YOLO轻量级网络的树木检测算法[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 518-527. |
[9] | 宋立博a,费燕琼b. 新型Lite YOLOv4-Tiny算法及其在裂纹智能检测中的应用[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 528-536. |
[10] | 沈傲1, 2,胡冀苏2, 3,金鹏飞4,周志勇2,钱旭升2, 3,郑毅2,包婕4,王希明4,戴亚康1, 2. 基于课程学习训练的聚合注意力网络Multi-SEANet用于MRI图像的格里森级别组无创预测[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 109-119. |
[11] | 薛永波a,刘 钊b,李泽阳a,朱 平a. 基于改进分水岭算法和U-net神经网络模型的复合材料CT图像分割方法[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(6): 783-792. |
[12] | . 基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(1): 100-113. |
[13] | SONG Hao-hao (宋好好), LU Zhen (陆 臻). Image Fusion Scheme Based on Nonsubsampled Contourlet and Block-Based Cosine Transform[J]. J Shanghai Jiaotong Univ Sci, 2012, 17(1): 8-012. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||