Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Patch-Based Representation of Visual Speech

Lucey, P. and Sridharan, S.

    Visual information from a speaker's mouth region is known to improve automatic speech recognition robustness, especially in the presence of acoustic noise. To date, the vast majority of work in this field has viewed these visual features in a holistic manner, which may not take into account the various changes that occur within articulation (process of changing the shape of the vocal tract using the articulators, i.e lips and jaw). Motivated by the work being conducted in fields of audio-visual automatic speech recognition (AVASR) and face recognition using articulatory features (AFs) and patches respectively, we present a proof of concept paper which represents the mouth region as a ensemble of image patches. Our experiments show that by dealing with the mouth region in this manner, we are able to extract more speech information from the visual domain. For the task of visual-only speaker-independent isolated digit recognition, we were able to improve the relative word error rate by more than 23% on the CUAVE audio-visual corpus.
Cite as: Lucey, P. and Sridharan, S. (2006). Patch-Based Representation of Visual Speech. In Proc. HCSNet Workshop on the Use of Vision in Human-Computer Interaction, (VisHCI 2006), Canberra, Australia. CRPIT, 56. Goecke, R., Robles-Kelly, A. and Caelli, T., Eds. ACS. 79-85.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS
 

 

ACS Logo© Copyright Australian Computer Society Inc. 2001-2014.
Comments should be sent to the webmaster at crpit@scem.uws.edu.au.
This page last updated 16 Nov 2007