VarLifter: Recovering Variables and Types from Bytecode of Solidity Smart Contracts
Since funds or tokens in smart contracts are maintained through specific state variables, contract audit, an effective means for security assurance, particularly focuses on these variables and their related operations. However, the absence of publicly accessible source code for numerous contracts, with only bytecode exposed, hinders audit efforts. Recovering variables and their types from Solidity bytecode is thus a critical task in smart contract analysis and audit, yet this is a challenging task because the bytecode loses variable and type information, only with low-level data operated by stack manipulations and untyped memory/storage accesses. The state-of-the-art smart contract decompilers miss identifying many variables and incorrectly infer the types for many identified variables. To this end, we propose \textsf{\textsc{VarLifter}}, a lifter dedicated to the precise and efficient recovery of typed variables. \textsf{\textsc{VarLifter}} interprets every read or written field of a data region as at least one potential variable, and after discarding falsely identified variables, it progressively refines the variable types based on the variable behaviors in the form of operation sequences. We evaluate \textsf{\textsc{VarLifter}} on 34,832 real-world Solidity smart contracts. \textsf{\textsc{VarLifter}} attains a precision of 97.48% and a recall of 91.84% for typed variable recovery. Moreover, \textsf{\textsc{VarLifter}} finishes analyzing 77% of smart contracts in around 10 seconds per contract. If \textsf{\textsc{VarLifter}} is used to replace the variable recovery modules of the two state-of-the-art Solidity bytecode decompilers, 52.4%, and 74.6% more typed variables will be correctly recovered, respectively. The applications of \textsf{\textsc{VarLifter}} to contract decompilation, contract audit, and contract bytecode fuzzing illustrate that the recovered variable information improves many contract analysis tasks.