Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop

Mark Hasegawa-Johnson; James Baker; Sarah Borys; Ken Chen; Emily Coogan; Steven Greenberg; Amit Juneja; Katrin Kirchhoff; Karen Livescu; Srividya Mohan; Jennifer Muller; Kemal Sonmez; Tianyu Wang

doi:10.1109/ICASSP.2005.1415088

Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop

Mark Hasegawa-Johnson, James Baker, Sarah Borys, Ken Chen, Emily Coogan, Steven Greenberg, Amit Juneja, Katrin Kirchhoff, Karen Livescu, Srividya Mohan, Jennifer Muller, Kemal Sonmez, Tianyu Wang

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

61 Scopus citations

Abstract

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multi-frame acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

Original language	English (US)
Title of host publication	2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	I213-I216
ISBN (Print)	0780388747, 9780780388741
DOIs	https://doi.org/10.1109/ICASSP.2005.1415088
State	Published - 2005
Externally published	Yes
Event	2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States Duration: Mar 18 2005 → Mar 23 2005

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	I
ISSN (Print)	1520-6149

Other

Other	2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Country/Territory	United States
City	Philadelphia, PA
Period	3/18/05 → 3/23/05

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2005.1415088

Cite this

Hasegawa-Johnson, M., Baker, J., Borys, S., Chen, K., Coogan, E., Greenberg, S., Juneja, A., Kirchhoff, K., Livescu, K., Mohan, S., Muller, J., Sonmez, K., & Wang, T. (2005). Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing (pp. I213-I216). Article 1415088 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. I). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2005.1415088

Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop. / Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah et al.
2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Institute of Electrical and Electronics Engineers Inc., 2005. p. I213-I216 1415088 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. I).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hasegawa-Johnson, M, Baker, J, Borys, S, Chen, K, Coogan, E, Greenberg, S, Juneja, A, Kirchhoff, K, Livescu, K, Mohan, S, Muller, J, Sonmez, K & Wang, T 2005, Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop. in 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing., 1415088, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. I, Institute of Electrical and Electronics Engineers Inc., pp. I213-I216, 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, Philadelphia, PA, United States, 3/18/05. https://doi.org/10.1109/ICASSP.2005.1415088

Hasegawa-Johnson M, Baker J, Borys S, Chen K, Coogan E, Greenberg S et al. Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Institute of Electrical and Electronics Engineers Inc. 2005. p. I213-I216. 1415088. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2005.1415088

Hasegawa-Johnson, Mark ; Baker, James ; Borys, Sarah et al. / Landmark-based speech recognition : Report of the 2004 Johns Hopkins summer workshop. 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Institute of Electrical and Electronics Engineers Inc., 2005. pp. I213-I216 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{b5f5388414944b40a5fece4bfe72b397,

title = "Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop",

abstract = "Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multi-frame acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.",

author = "Mark Hasegawa-Johnson and James Baker and Sarah Borys and Ken Chen and Emily Coogan and Steven Greenberg and Amit Juneja and Katrin Kirchhoff and Karen Livescu and Srividya Mohan and Jennifer Muller and Kemal Sonmez and Tianyu Wang",

year = "2005",

doi = "10.1109/ICASSP.2005.1415088",

language = "English (US)",

isbn = "0780388747",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "I213--I216",

booktitle = "2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing",

note = "2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 ; Conference date: 18-03-2005 Through 23-03-2005",

}

TY - GEN

T1 - Landmark-based speech recognition

T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05

AU - Hasegawa-Johnson, Mark

AU - Baker, James

AU - Borys, Sarah

AU - Chen, Ken

AU - Coogan, Emily

AU - Greenberg, Steven

AU - Juneja, Amit

AU - Kirchhoff, Katrin

AU - Livescu, Karen

AU - Mohan, Srividya

AU - Muller, Jennifer

AU - Sonmez, Kemal

AU - Wang, Tianyu

PY - 2005

Y1 - 2005

N2 - Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multi-frame acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

AB - Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multi-frame acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

UR - http://www.scopus.com/inward/record.url?scp=27144481719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27144481719&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2005.1415088

DO - 10.1109/ICASSP.2005.1415088

M3 - Conference contribution

C2 - 19212454

AN - SCOPUS:27144481719

SN - 0780388747

SN - 9780780388741

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - I213-I216

BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 18 March 2005 through 23 March 2005

ER -

Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this