Control-Flow Deobfuscation using Trace-Informed Compositional Program Synthesis
Code deobfuscation, which attempts to simplify code that has been intentionally obfuscated to prevent understanding, is a critical technique for downstream security analysis tasks like malware detection. While there has been significant prior work on code deobfuscation, most techniques either do not handle control flow obfuscations that modify control flow or they target specific classes of control flow obfuscations, making them unsuitable for handling new types of obfuscations or combinations of existing ones. In this paper, we study a new deobfuscation technique that is based on program synthesis and that can handle a broad class of control flow obfuscations. Given an obfuscated program $P$, our approach aims to synthesize a smallest program that is a control-flow reduction of $P$ and that is semantically equivalent. Since our method does not assume knowledge about the types of obfuscations that have been applied to the original program, the underlying synthesis problem ends up being very challenging. To address this challenge, we propose a novel trace-informed compositional synthesis algorithm that leverages hints present in dynamic traces of the obfuscated program to decompose the synthesis problem into a set of simpler subproblems. In particular, we show how dynamic traces can be useful for inferring a suitable control-flow skeleton of the deobfuscated program and performing independent synthesis of each basic block. We have implemented this approach in a tool called Chisel and evaluate it on 546 benchmarks that have been obfuscated using combinations of six different obfuscation techniques. Our evaluation shows that our approach is effective and that it produces code that is almost identical (modulo variable renaming) to the original (non-obfuscated) program in 86% of cases. Our evaluation also shows that Chisel significantly outperforms existing techniques.
Thu 24 OctDisplayed time zone: Pacific Time (US & Canada) change
13:40 - 15:20 | Program Synthesis and Verification 1OOPSLA 2024 at IBR West Chair(s): Benjamin Delaware Purdue University | ||
13:40 20mTalk | Control-Flow Deobfuscation using Trace-Informed Compositional Program Synthesis OOPSLA 2024 Benjamin Mariano University of Texas at Austin, Ziteng Wang University of Texas at Austin, Shankara Pailoor University of Texas at Austin, Christian Collberg University of Arizona, Işıl Dillig University of Texas at Austin DOI | ||
14:00 20mTalk | Finding ∀∃ Hyperbugs Using Symbolic Execution OOPSLA 2024 Arthur Correnson CISPA Helmholtz Center for Information Security, Tobias Nießen TU Wien, Bernd Finkbeiner CISPA Helmholtz Center for Information Security, Georg Weissenbacher TU Wien DOI | ||
14:20 20mTalk | Mechanizing the CMP Abstraction for Parameterized Verification OOPSLA 2024 Yongjian Li Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China, Bohua Zhan Institute of Software, Chinese Academy of Sciences, Jun Pang University of Luxembourg DOI | ||
14:40 20mTalk | Model Checking Distributed Protocols in Must OOPSLA 2024 Constantin Enea LIX, CNRS, Ecole Polytechnique, Dimitra Giannakopoulou Amazon Web Services, Michalis Kokologiannakis ETH Zurich, Rupak Majumdar MPI-SWS DOI | ||
15:00 20mTalk | Monotone Procedure Summarization via Vector Addition Systems and Inductive Potentials OOPSLA 2024 DOI |