Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well.
In this paper, we propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback, such as the execution results reported by the test suites. We evaluate CYCLE on three popular code generation benchmarks, HumanEval, MBPP, and APPS. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B, and the experiments show that CYCLE consistently boosts the code generation performance, by up to 63.5%, across benchmarks and varied model sizes. We also notice that CYCLE outperforms code LMs that have 3$\times$ more parameters in self-refinement.
Thu 24 OctDisplayed time zone: Pacific Time (US & Canada) change
13:40 - 15:20 | |||
13:40 20mTalk | CYCLE: Learning to Self-Refine the Code Generation OOPSLA 2024 Yangruibo Ding Columbia University, Marcus J. Min Columbia University, Gail Kaiser Columbia University, Baishakhi Ray Columbia University, New York; AWS AI Lab DOI | ||
14:00 20mTalk | Evaluating the effectiveness of Deep Learning Models for Foundational Program Analysis Tasks OOPSLA 2024 Qian Chen Nanjing University, Chenyang Yu Department of Computer Science and Technology, Nanjing University, Ruyan Liu Department of Computer Science and Technology, Nanjing University, Chi Zhang Nanjing University, Yu Wang Nanjing University, Ke Wang , Ting Su East China Normal University, Linzhang Wang Nanjing University DOI | ||
14:20 20mTalk | Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs OOPSLA 2024 Federico Cassano Northeastern University, John Gouwar Northeastern University, Francesca Lucchetti Northeastern University, Claire Schlesinger Northeastern University, Anders Freeman Wellesley College, Carolyn Jane Anderson Wellesley College, Molly Q Feldman Oberlin College, Michael Greenberg Stevens Institute of Technology, Abhinav Jangda Microsoft Research, Arjun Guha Northeastern University; Roblox DOI Pre-print | ||
14:40 20mTalk | Statically Contextualizing Large Language Models with Typed Holes OOPSLA 2024 Andrew Blinn University of Michigan, Xiang Li University of Michigan, Ann Arbor, June Hyung Kim University of Michigan, Cyrus Omar University of Michigan DOI | ||
15:00 20mTalk | WhiteFox: White-box Compiler Fuzzing Empowered by Large Language Models OOPSLA 2024 Chenyuan Yang University of Illinois at Urbana-Champaign, Yinlin Deng University of Illinois at Urbana-Champaign, Runyu Lu Huazhong University of Science and Technology, Jiayi Yao The Chinese University of Hong Kong, Shenzhen, Jiawei Liu University of Illinois at Urbana-Champaign, Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign DOI |