We receive emotional signals from different sources, including the face, the whole body, and the natural scene. Previous research has shown the importance of context provided by the whole body and the scene on the recognition of facial expressions. This study measured physiological responses to face-body-scene combinations. Participants freely viewed emotionally congruent and incongruent face-body and body-scene pairs whilst eye fixations, pupil-size, and electromyography (EMG) responses were recorded. Participants attended more to angry and fearful vs. happy or neutral cues, independent of the source and relatively independent from whether the face body and body scene combinations were emotionally congruent or not. Moreover, angry faces combined with angry bodies and angry bodies viewed in aggressive social scenes elicited greatest pupil dilation. Participants' face expressions matched the valence of the stimuli but when face-body compounds were shown, the observed facial expression influenced EMG responses more than the posture. Together, our results show that the perception of emotional signals from faces, bodies and scenes depends on the natural context, but when threatening cues are presented, these threats attract attention, induce arousal, and evoke congruent facial reactions.