WhiteFox: White-box Compiler Fuzzing Empowered by Large Language Models (SPLASH 2024 - OOPSLA 2024)

Who

Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang

Track

SPLASH 2024 OOPSLA

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Oct 2024 15:00 - 15:20 at IBR East - Machine Learning and Programming Languages Chair(s): Loris D'Antoni

Abstract

Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing.

To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs.

DOI

https://doi.org/10.1145/3689736

Chenyuan Yang

University of Illinois at Urbana-Champaign

United States

Yinlin Deng

University of Illinois at Urbana-Champaign

United States

Runyu Lu

Huazhong University of Science and Technology

Jiayi Yao

The Chinese University of Hong Kong, Shenzhen

Jiawei Liu

University of Illinois at Urbana-Champaign

United States

Reyhaneh Jabbarvand

University of Illinois at Urbana-Champaign

United States

Lingming Zhang

University of Illinois at Urbana-Champaign

United States

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 24 Oct
Displayed time zone: Pacific Time (US & Canada) change

13:40 - 15:20	Machine Learning and Programming LanguagesOOPSLA 2024 at IBR East Chair(s): Loris D'Antoni UCSD

13:40 20m Talk		CYCLE: Learning to Self-Refine the Code Generation OOPSLA 2024 Yangruibo Ding Columbia University, Marcus J. Min Columbia University, Gail Kaiser Columbia University, Baishakhi Ray Columbia University, New York; AWS AI Lab DOI
14:00 20m Talk		Evaluating the effectiveness of Deep Learning Models for Foundational Program Analysis Tasks OOPSLA 2024 Qian Chen Nanjing University, Chenyang Yu Department of Computer Science and Technology, Nanjing University, Ruyan Liu Department of Computer Science and Technology, Nanjing University, Chi Zhang Nanjing University, Yu Wang Nanjing University, Ke Wang , Ting Su East China Normal University, Linzhang Wang Nanjing University DOI
14:20 20m Talk		Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs OOPSLA 2024 Federico Cassano Northeastern University, John Gouwar Northeastern University, Francesca Lucchetti Northeastern University, Claire Schlesinger Northeastern University, Anders Freeman Wellesley College, Carolyn Jane Anderson Wellesley College, Molly Q Feldman Oberlin College, Michael Greenberg Stevens Institute of Technology, Abhinav Jangda Microsoft Research, Arjun Guha Northeastern University; Roblox DOI Pre-print
14:40 20m Talk		Statically Contextualizing Large Language Models with Typed Holes OOPSLA 2024 Andrew Blinn University of Michigan, Xiang Li University of Michigan, Ann Arbor, June Hyung Kim University of Michigan, Cyrus Omar University of Michigan DOI
15:00 20m Talk		WhiteFox: White-box Compiler Fuzzing Empowered by Large Language Models OOPSLA 2024 Chenyuan Yang University of Illinois at Urbana-Champaign, Yinlin Deng University of Illinois at Urbana-Champaign, Runyu Lu Huazhong University of Science and Technology, Jiayi Yao The Chinese University of Hong Kong, Shenzhen, Jiawei Liu University of Illinois at Urbana-Champaign, Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign DOI