In natural communication speech perception is profoundly influenced by observable mouth movements. The additional visual information can greatly facilitate intelligibility but incongruent visual information may also lead to novel percepts that neither match the auditory nor the visual information as evidenced by the McGurk effect. Recent models of audiovisual (AV) speech perception accentuate the role of speech motor areas and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for speech perception. In this event-related 7 Tesla fMRI study we used three naturally spoken syllable pairs with matching AV information and one syllable pair designed to elicit the McGurk illusion. The data analysis focused on brain sites involved in processing and fusing of AV speech and engaged in the analysis of auditory and visual differences within AV presented speech. Successful fusion of AV speech is related to activity within the STS of both hemispheres. Our data supports and extends the audio-visual-motor model of speech perception by dissociating areas involved in perceptual fusion from areas more generally related to the processing of AV incongruence.