In alphabetic scripts, letters and speech sounds are the basic elements of correspondence between spoken and written language. In two previous fMRI studies. we showed that the response to speech sounds in the auditory association cortex was enhanced by congruent letters and suppressed by incongruent letters. Interestingly, temporal synchrony was critical for this congruency effect to occur. We interpreted these results as a neural correlate of letter-sound integration, driven by the learned congruence of letter-sound pairs. The present event-related fMRI study was designed to address two questions that could not directly be addressed in the previous studies, due to their passive nature and blocked design. Specifically: (1) to examine whether the enhancement/ suppression of auditory cortex are truly multisensory integration effects or can be explained by different attention levels during congruent/ incongruent blocks, and (2) to examine the effect of top-down task demands on the neural integration of letter-sound pairs. Firstly, we replicated the previous results with random stimulus presentation, which rules out an explanation of the congruency effect in auditory cortex solely in terms of attention. Secondly, we showed that the effects of congruency and temporal asynchrony in the auditory association cortex were absent during active matching. This indicates that multisensory responses in the auditory association cortex heavily depend on task demands. Without task instructions, the auditory cortex is modulated to favor the processing of congruent and synchronous information. This modulation is overruled during explicit matching when all audiovisual stimuli are equally relevant, independent of congruency and temporal relation.