Finding the Needle: A Study of the PE32 Rich Header and Respective Malware Triage

George D. Webster, Bojan Kolosnjaji, Christian von Pentz, Julian Kirsch, Zachary D. Hanif, Apostolis Zarras, Claudia Eckert

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Performing triage of malicious samples is a critical step in security analysis and mitigation development. Unfortunately, the obfuscation and outright removal of information contained in samples makes this a monumentally challenging task. However, the widely used portable executable file format (pe32), a data structure used by the windows os to handle executable code, contains hidden information that can provide a security analyst with an upper hand. In this paper, we perform the first accurate assessment of the hidden pe32 field known as the rich header and describe how to extract the data that it clandestinely contains. We study 964,816 malware samples and demonstrate how the information contained in the rich header can be leveraged to perform rapid triage across millions of samples, including packed and obfuscated binaries. We first show how to quickly identify post-modified and obfuscated binaries through anomalies in the header. Next, we exhibit the rich header’s utility in triage by presenting a proof of concept similarity matching algorithm which is solely based on the contents of the rich header. With our algorithm we demonstrate how the contents of the rich header can be used to identify similar malware, different versions of malware, and when malware has been built under different build environment; revealing potentially distinct actors. Furthermore, we are able to perform these operations in near real-time, less than 6.73 ms on commodity hardware across our studied samples. In conclusion, we establish that this little-studied header in the pe32 format is a valuable asset for security analysts and has a breadth of future potential.keywordsobject filevisual studiorapid triagegenerate source codemalicious samplethese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Original languageEnglish
Title of host publicationProceedings of the 14th Conference on Detection of Intrusions and Malware Vulnerability Assessment (DIMVA)
DOIs
Publication statusPublished - 2017

Cite this