Making sense of emotions manifesting in human voice is an important social skill which is influenced by emotions in other modalities, such as that of the corresponding face. Although processing emotional information from voices and faces simultaneously has been studied in adults, little is known about the neural mechanisms underlying the development of this ability in infancy. Here we investigated multimodal processing of fearful and happy face/voice pairs using event-related potential (ERP) measures in a group of 84 9-month-olds. Infants were presented with emotional vocalisations (fearful/happy) preceded by the same or a different facial expression (fearful/happy). The ERP data revealed that the processing of emotional information appearing in human voice was modulated by the emotional expression appearing on the corresponding face: Infants responded with larger auditory ERPs after fearful compared to happy facial primes. This finding suggests that infants dedicate more processing capacities to potentially threatening than to non-threatening stimuli. (C) 2014 Elsevier Inc. All rights reserved.