[1] |
ANUSUYA M A, KATTI S K. Speech recognition by machine: A review [J]. International Journal of Computer Science and Information Security, 2009, 6(3):181-205.
[2] |
RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition [J].Proceedings of the IEEE, 1989, 77(2): 257-286.
[3] |
HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
[4] |
GRAVES A, FERN?ANDEZ S, GOMEZ F, et al.Connectionist temporal classiˉcation: Labelling unsegmented sequence data with recurrent neural networks [C]//23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA: ACM, 2006:369-376.
[5] |
GRAVES A, JAITLY N. Towards end-to-end speech recognition with recurrent neural networks [C]//31st International Conference on Machine Learning. Beijing, China: W&CP, 2014: 1764-1772.
[6] |
BAHDANAU D, CHO K H, BENGIO Y. Neural machine translation by jointly learning to align and translate [C]//International Conference on Learning Representations. San Diego, CA, USA: Computational and Biological Learning Society, 2015: 0473.
[7] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//31st Conference on Neural Information Processing Systems. Long Beach, CA,USA: NIPS, 2017: 5998-6008.
[8] |
MARKOVNIKOV N, KIPYATKOVA I, LYAKSO E. End-to-end speech recognition in Russian[C]//International Conference on Speech and Computer. Leizig, Germany: Springer, 2018: 377-386.
[9] |
SAK H, SENIOR A, BEAUFAYS F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition [C]//15th Annual Conference of the International Speech Communication Association. Singapore: ISCA, 2014: 1128.
[10] |
HANNUN A Y, MAAS A L, JURAFSKY D,et al. First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs [EB/OL]. (2014-08-12) [2018-11-08].https://arxiv.org/pdf/1408.2873.pdf.
[11] |
MIAO Y, GOWAYYED M, METZE F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding [C]//IEEE Workshop on Automatic Speech Recognition and Understanding. Scottsdale, AZ, USA: IEEE, 2015: 167-174.
[12] |
MOHRI M, PEREIRA F, RILEY M. Weighted finite-state transducers in speech recognition [J]. Computer Speech & Language, 2002, 16(1): 69-88.
[13] |
CHOROWSKI J K, BAHDANAU D, SERDYUK D,et al. Attention-based models for speech recognition [C]//29th Conference on Advances in Neural Information Processing Systems. Montreal, Canada: NIPS,2015: 577-585.
[14] |
BAHDANAU D, CHOROWSKI J, SERDYUK D, et al. End-to-end attention-based large vocabulary speech recognition [C]//41st IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai,China: IEEE, 2016: 4945-4949.
[15] |
LU L, ZHANG X, CHO K, et al. A study of the recurrent nerual network encoder-decoder for large vocabulary speech recognition [C]//Proceedings of the Interspeech. Dresden, Germany: ISCA, 2015: 3249-3253.
[16] |
ZEILER M D. Adadelta: An adaptive learning rate method [EB/OL]. (2012-12-22) [2018-11-08].https://arxiv.org/pdf/1212.5701.pdf.
[17] |
WATANABE S, HORI T, KARITA S, et al. ESPnet:End-to-end speech processing toolkit [C]//Proceedings of the Interspeech. Hyderabad, India: ISCA, 2018:2207-2211.
[18] |
POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit [C]//IEEE Workshop on Automatic Speech Recognition and Understanding.Hawaii, USA: IEEE, 2011: 1-4.