Skip to main content
Skip to main content

Cortical Dynamics of Auditory-Visual Speech: A Forward Model of Multisensory Integration.




Virginie van Wassenhove

A new representational framework for the integration of auditory and visual information in speech perception.

In noisy settings, seeing the interlocutor's face helps to disambiguate what is being said. For this to happen, the brain must integrate auditory and visual information. Three major problems are (1) bringing together separate sensory streams of information, (2) extracting auditory and visual speech information, and (3) identifying this information as a unified auditory-visual percept. In this dissertation, a new representational framework for auditory visual (AV) speech integration is offered. The experimental work (psychophysics and electrophysiology (EEG)) suggests specific neural mechanisms for solving problems (1), (2), and (3) that are consistent with a (forward) 'analysis-by-synthesis' view of AV speech integration. In Chapter I, multisensory perception and integration are reviewed. A unified conceptual framework serves as background for the study of AV speech integration. In Chapter II, psychophysics testing the perception of desynchronized AV speech inputs show the existence of a ~250ms temporal window of integration in AV speech integration. In Chapter III, an EEG study shows that visual speech modulates early on the neural processing of auditory speech. Two functionally independent modulations are (i) a ~250ms amplitude reduction of auditory evoked potentials (AEPs) and (ii) a systematic temporal facilitation of the same AEPs as a function of the saliency of visual speech. In Chapter IV, an EEG study of desynchronized AV speech inputs shows that (i) fine-grained (gamma, ~25ms) and (ii) coarse-grained (theta, ~250ms) neural mechanisms simultaneously mediate the processing of AV speech. In Chapter V, a new illusory effect is proposed, where non-speech visual signals modify the perceptual quality of auditory objects. EEG results show very different patterns of activation as compared to those observed in AV speech integration. An MEG experiment is subsequently proposed to test hypotheses on the origins of these differences. In Chapter VI, the 'analysis-by-synthesis' model of AV speech integration is contrasted with major speech theories. From a Cognitive Neuroscience perspective, the 'analysis-by-synthesis' model is argued to offer the most sensible representational system for AV speech integration. This thesis shows that AV speech integration results from both the statistical nature of stimulation and the inherent predictive capabilities of the nervous system.