Investigation of Improved Approaches to Bayes Risk Decoding

doi:10.1007/s12204-011-1189-1

Abstract

Abstract: Abstract: Bayes risk (BR) decoding methods have been widely
investigated in the speech recognition area due to its flexibility and
complexity compared with the maximum a posteriori (MAP) method regarding to
minimum word error (MWE) optimization. This paper investigates two improved
approaches to the BR decoding, aiming at minimizing word error. The novelty
of the proposed methods is shown in the explicit optimization of the
objective function, the value of which is calculated by an improved forward
algorithm on the lattice. However, the result of the first method is
obtained by an expectation maximization (EM) like iteration, while the
result of the second one is achieved by traversing the confusion network
(CN), both of which lead to an optimized objective function value with
distinct approaches. Experimental results indicate that the proposed methods
result in an error reduction for lattice rescoring, compared with the
traditional CN method for lattice rescoring.

Key words:

Bayes risk (BR)| confusion network (CN)| speech
recognition| lattice rescoring

摘要： Abstract: Bayes risk (BR) decoding methods have been widely
investigated in the speech recognition area due to its flexibility and
complexity compared with the maximum a posteriori (MAP) method regarding to
minimum word error (MWE) optimization. This paper investigates two improved
approaches to the BR decoding, aiming at minimizing word error. The novelty
of the proposed methods is shown in the explicit optimization of the
objective function, the value of which is calculated by an improved forward
algorithm on the lattice. However, the result of the first method is
obtained by an expectation maximization (EM) like iteration, while the
result of the second one is achieved by traversing the confusion network
(CN), both of which lead to an optimized objective function value with
distinct approaches. Experimental results indicate that the proposed methods
result in an error reduction for lattice rescoring, compared with the
traditional CN method for lattice rescoring.

关键词:

Bayes risk (BR)| confusion network (CN)| speech
recognition| lattice rescoring

CLC Number:

TP 391.4

XU Hai-hua (徐海华), ZHU Jie (朱杰). Investigation of Improved Approaches to Bayes Risk Decoding[J]. Journal of shanghai Jiaotong University (Science), 2011, 16(5): 524-529.

References

[1] Levenshtein V I. Binary codes capable of correcting deletions,

insertions and reversals [J]. Soviet Physics Doklady, 1966, 10(8): 707-710.
[2] Stolcke A, Konig Y, Weintraub M. Explicit word error minimization in

n-best list rescoring [C]// Proceedings of the 5th European

Conference on Speech Communication and Technology. Rhodes, Greece:

ISCA, 1997: 163-166.
[3] Mangu L, Brill E, Stolcke A. Finding consensus in speech recognition:

Word error minimization and other applications of confusion networks

[J]. Computer Speech and Language, 2000, 14: 373-400.
[4] Goel V, Kumar S, Byrne W J. Minimum bayes-risk automatic speech

recognition [J]. Computer Speech and Language, 2000, 14: 115-135.
[5] Wessel F, Schluter R, Ney H. Explicit word error minimization using word

hypothesis posterior probabilities [C]// Proceeding of International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, USA: IEEE,

2001: 33-36.
[6] Goel V, Byrne W J. Segmental minimum bayes-risk decoding for automatic

speech recognition [J]. IEEE Transactions on Speech and Audio Processing, 2006, 12: 234-249.
[7] Xu H, Povey D, Zhu J, et al. Minimum hypothesis phone error as a

decoding method for speech recognition [C]// Proceedings of INTERSPEECH. Brighton, UK: ISCA, 2009:

76-79.
[8] Povey D, Woodland P C. Minimum phone error and I-smoothing for improved

discriminative training [C]// Proceeding of International Conference on Acoustics, Speech, and Signal Processing. Florida, USA: IEEE, 2002: 105-108.
[9] Hoffmeister B, Schluter R, Ney H. Bayes risk approximations using time

overlap with an application to system combination [C]// Proceedings of INTERSPEECH. Brighton, UK:

ISCA, 2009: 1191-1194.
[10] Heigold G, Macherey W, Schluter R, et al. Minimum exact word error

training [C]// Proceedings of Automatic Speech Recognition and Understanding. San Juan, USA: IEEE, 2005: 186-190.
[11] Xu H, Povey D, Mangu L, et al. An improved consensus-like method for

minimum Bayes risk decoding and lattice combination [C]// Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Dallas, USA: IEEE, 2010: 4938-4941.
[12] Stolcke A. SRILM --- An extensible language modeling toolkit

[C]// Proceedings of International Conference on Spoken Language Processing. Denver, USA: ISCA, 2002: 901-904.
[13] Fiscus J G. A post-processing system to yield reduced word error rates:

Recognizer output error reduction (ROVER) [C]// Proceedings of Automatic Speech Recognition and Understanding.

Santa Barbara, USA: IEEE, 1997: 347-354.
[14] Young S, Evermanna G, Gales M, et al. The HTK book [M]. 3rd ed.

Cambridge: Cambridge University, 2006.
[15] Povey D, Kanevsky D, Kingsbury B, et al. Boosted MMI for model and feature-space

discriminative training recognition [C]// Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, USA: IEEE, 2008: 4057-4060.
[16] Ortmanns S, Ney H. A word graph algorithm for large vocabulary

continuous speech recognition [J]. Computer Speech and Language, 1997, 11: 43-72.
[17] Povey D. Discriminative training for large vocabulary speech

recognition [M]. Cambridge: Cambridge University, 2004.