Journal of Shanghai Jiao Tong University (Science) ›› 2020, Vol. 25 ›› Issue (1): 70-75.doi: 10.1007/s12204-019-2147-6
Previous Articles Next Articles
ZHU Tao (朱涛), CHENG Chunling¤ (程春玲)
Online:2020-01-15
Published:2020-01-12
Contact:
CHENG Chunling (程春玲)
E-mail: chengcl@njupt.edu.cn
CLC Number:
ZHU Tao (朱涛), CHENG Chunling (程春玲). Joint CTC-Attention End-to-End Speech Recognition with a Triangle Recurrent Neural Network Encoder[J]. Journal of Shanghai Jiao Tong University (Science), 2020, 25(1): 70-75.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xuebao.sjtu.edu.cn/sjtu_en/EN/10.1007/s12204-019-2147-6
| [1] | ANUSUYA M A, KATTI S K. Speech recognition by machine: A review [J]. International Journal of Computer Science and Information Security, 2009, 6(3):181-205. |
| [2] | RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition [J].Proceedings of the IEEE, 1989, 77(2): 257-286. |
| [3] | HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. |
| [4] | GRAVES A, FERN?ANDEZ S, GOMEZ F, et al.Connectionist temporal classiˉcation: Labelling unsegmented sequence data with recurrent neural networks [C]//23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA: ACM, 2006:369-376. |
| [5] | GRAVES A, JAITLY N. Towards end-to-end speech recognition with recurrent neural networks [C]//31st International Conference on Machine Learning. Beijing, China: W&CP, 2014: 1764-1772. |
| [6] | BAHDANAU D, CHO K H, BENGIO Y. Neural machine translation by jointly learning to align and translate [C]//International Conference on Learning Representations. San Diego, CA, USA: Computational and Biological Learning Society, 2015: 0473. |
| [7] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//31st Conference on Neural Information Processing Systems. Long Beach, CA,USA: NIPS, 2017: 5998-6008. |
| [8] | MARKOVNIKOV N, KIPYATKOVA I, LYAKSO E. End-to-end speech recognition in Russian[C]//International Conference on Speech and Computer. Leizig, Germany: Springer, 2018: 377-386. |
| [9] | SAK H, SENIOR A, BEAUFAYS F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition [C]//15th Annual Conference of the International Speech Communication Association. Singapore: ISCA, 2014: 1128. |
| [10] | HANNUN A Y, MAAS A L, JURAFSKY D,et al. First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs [EB/OL]. (2014-08-12) [2018-11-08].https://arxiv.org/pdf/1408.2873.pdf. |
| [11] | MIAO Y, GOWAYYED M, METZE F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding [C]//IEEE Workshop on Automatic Speech Recognition and Understanding. Scottsdale, AZ, USA: IEEE, 2015: 167-174. |
| [12] | MOHRI M, PEREIRA F, RILEY M. Weighted finite-state transducers in speech recognition [J]. Computer Speech & Language, 2002, 16(1): 69-88. |
| [13] | CHOROWSKI J K, BAHDANAU D, SERDYUK D,et al. Attention-based models for speech recognition [C]//29th Conference on Advances in Neural Information Processing Systems. Montreal, Canada: NIPS,2015: 577-585. |
| [14] | BAHDANAU D, CHOROWSKI J, SERDYUK D, et al. End-to-end attention-based large vocabulary speech recognition [C]//41st IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai,China: IEEE, 2016: 4945-4949. |
| [15] | LU L, ZHANG X, CHO K, et al. A study of the recurrent nerual network encoder-decoder for large vocabulary speech recognition [C]//Proceedings of the Interspeech. Dresden, Germany: ISCA, 2015: 3249-3253. |
| [16] | ZEILER M D. Adadelta: An adaptive learning rate method [EB/OL]. (2012-12-22) [2018-11-08].https://arxiv.org/pdf/1212.5701.pdf. |
| [17] | WATANABE S, HORI T, KARITA S, et al. ESPnet:End-to-end speech processing toolkit [C]//Proceedings of the Interspeech. Hyderabad, India: ISCA, 2018:2207-2211. |
| [18] | POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit [C]//IEEE Workshop on Automatic Speech Recognition and Understanding.Hawaii, USA: IEEE, 2011: 1-4. |
| [1] | Wang Yan, Wang Likang, Zhang Jinfeng, Fan Xianghui. Global Dense Two-Branch Cascade Network for Underwater Image Enhancement [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 458-474. |
| [2] | Wang Jing, Fang Zhiqiang, Li Qianqian, Tang Zhiwei, Huang Zhangyang, Hong Zhonghua, He Haiyang. YOLO-SDD: An Improved YOLOv5 for Storm Drain Detection in Street-Level View [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 359-374. |
| [3] | Tao Hongjie, Li Zhaofei, Qi Fei, Chen Jingjue, Zhou Hao. High Resolution Remote Sensing Image Segmentation Method with Improved DeepLabv3+ [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 348-358. |
| [4] | Miao Jun, Gong Shaocui, Deng Yongqiang, Liang Hao, Li Juanjuan, Qi Honggang, Zhang Maoxuan. YOLO-VSF: An Improved YOLO Model by Incorporating Attention Mechanism for Object Detection in Traffic Scenes [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 334-347. |
| [5] | Xu Luzhen, Yan Haoyin, He Maokui, Guo Zixian, Zhou Yeping, Liu Peiqi, Zhang Jie, Dai Lirong. Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting Transcription [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 298-304. |
| [6] | Nurmemet Yolwas, Sun Lixu, Li Xin, Liu Qichao, Wang Zhixiang. Wav2vec-AD: Acoustic Unit Discovery Module-Integrated, Self-Supervised Contrastive Pre-training Approach for Speech Recognition [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 289-297. |
| [7] | Xiao Sujie, Hao Ruipeng, Cheng Gaofeng, Xu Xiaoyan, Li Ta. EC-BERT: A BERT Language Model with Error Correction for Mandarin Chinese Speech Recognition [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 282-288. |
| [8] | Liu Shuanghong, Song Zhida, He Liang. Improving ECAPA-TDNN Performance with Coordinate Attention [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(2): 241-247. |
| [9] | YE Haibo, YU Ke, NIU Rongbing, LI Siwei. Research on the Application of Large Language Model-Based Tactical Voice Command and Control Systems in Combat Environments [J]. Air & Space Defense, 2026, 9(1): 98-107. |
| [10] | YANG Minghui, WEI Yali, LU Junyan, LI Xinhai. A Cross-Modal Target Matching Method for Optical Image and SAR Images Based on Window Attention Mechanism [J]. Air & Space Defense, 2026, 9(1): 73-79. |
| [11] | LIU Xuan, LAN Xiaochen, XU Dapeng, HE Liang, LIU Maoshen. Sea Clutter Suppression Method Based on Deep Learning Temporal Feature Enhancement [J]. Air & Space Defense, 2026, 9(1): 36-45. |
| [12] | Xia Jie, Wu Xiaodong, Xu Min. BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 154-166. |
| [13] | ZHANG Zhiyuan, HU Jisu, ZHANG Yueyue, QIAN Xusheng, ZHOU Zhiyong, DAI Yakang. Attention-Guided Multi-Task Learning for Prostate Cancer Pelvic Lymph Node Metastasis Prediction [J]. Journal of Shanghai Jiao Tong University, 2025, 59(8): 1216-1224. |
| [14] | ZHANG Li, WANG Bao, JIA Jianxiong, SONG Zhumeng, YE Yutong, YU Yue, LIN Jiaqing, XU Xiaoyuan. End-to-End Collaborative Optimization Method for Microgrid Power Prediction and Optimal Scheduling [J]. Journal of Shanghai Jiao Tong University, 2025, 59(6): 720-731. |
| [15] | DING Leqi, WANG Biyun, YAO Lixiu, CAI Yunze. MAGPNet: Multi-Domain Attention-Guided Pyramid Network for Infrared Small Object Detection [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 935-951. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||