Letters and speech sounds are the basic units of correspondence between spoken and written language. Associating auditory information of speech sounds with visual information of letters is critical for learning to read; however, the neural mechanisms underlying this association remain poorly understood. The present functional magnetic resonance imaging study investigates the automaticity and behavioral relevance of integrating letters and speech sounds. Within a unimodal auditory identification task, speech sounds were presented in isolation (unimodally) or bimodally in congruent and incongruent combinations with visual letters. Furthermore, the quality of the visual letters was manipulated parametrically. Our analyses revealed that the presentation of congruent visual letters led to a behavioral improvement in identifying speech sounds, which was paralleled by a similar modulation of cortical responses in the left superior temporal sulcus. Under low visual noise, cortical responses in superior temporal and occipito-temporal cortex were further modulated by the congruency between auditory and visual stimuli. These cross-modal modulations of performance and cortical responses during an unimodal auditory task (speech identification) indicate the existence of a strong and automatic functional coupling between processing of letters (orthography) and speech (phonology) in the literate adult brain.