Dependency-aware Code Naturalness (SPLASH 2024 - OOPSLA 2024)

Sun 20 - Fri 25 October 2024 Pasadena, California, United States

Who

Chen Yang, Junjie Chen, Jiajun Jiang, Yuliang Huang

Track

SPLASH 2024 OOPSLA

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Oct 2024 11:00 - 11:20 at IBR East - Software Engineering Chair(s): Michael Coblenz

Abstract

Code naturalness, which captures repetitiveness and predictability in programming languages, has proven valuable for various code-related tasks in software engineering. However, precisely measuring the naturalness of code remains a fundamental challenge. Existing methods measure code naturalness over individual lines of code while ignoring the deep semantic relations among different lines of code, e.g., program dependency, which may negatively affect the precision of the measure. Despite the intuitive appeal of extending the code naturalness measure to the code dependency domain (as there are some work that have initiated the utilization of code dependency for diverse code-related tasks), this assumption remains unexplored and warrants direct investigation. In this study, we aim to perform the first empirical study to investigate whether incorporating code dependency, instead of analyzing isolated individual lines, can enhance the precision of measuring code naturalness.

To achieve that, we first propose a new method named DAN for measuring code naturalness by incorporating the rich dependency information in the code. Specifically, DAN extracts multiple sequences of code lines by traversing the program dependency graph, where different code lines are connected by dependencies in each sequence, and then the code naturalness will be measured by taking each sequence as a whole. In this way, the dependency information can be well captured. Finally, we have conducted an extensive study to evaluate the influence of code dependency for measuring code naturalness with DAN, and compared it with the state-of-the-art methods under three emerging application scenarios of code naturalness. The results demonstrate that DAN can not only better distinguish natural and unnatural code, but also substantially boost two important downstream applications of code naturalness, i.e., distinguishing buggy and non-buggy code lines and data cleansing for training better code models, reflecting the significance of code dependency in measuring code naturalness.

DOI

https://doi.org/10.1145/3689794

Chen Yang

Tianjin University

China

Junjie Chen

Tianjin University

China

Jiajun Jiang

Tianjin University

China

Yuliang Huang

College of Intelligence and Computing, Tianjin University

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 24 Oct
Displayed time zone: Pacific Time (US & Canada) change

10:40 - 12:20	Software EngineeringOOPSLA 2024 at IBR East Chair(s): Michael Coblenz University of California, San Diego

10:40 20m Talk		AdoB: Bridging Benign and Byzantine Consensus with Atomic Distributed Objects OOPSLA 2024 Wolf Honore Yale University, Longfei Qiu Yale University, Yoonseung Kim Yale University, Ji-Yong Shin Northeastern University, Jieung Kim Yonsei University, Zhong Shao Yale University DOI
11:00 20m Talk		Dependency-aware Code NaturalnessRemote OOPSLA 2024 Chen Yang Tianjin University, Junjie Chen Tianjin University, Jiajun Jiang Tianjin University, Yuliang Huang College of Intelligence and Computing, Tianjin University DOI
11:20 20m Talk		Iterative-Epoch Online Cycle Elimination for Context-Free Language Reachability OOPSLA 2024 Pei Xu University of Technology Sydney / UNSW Sydney, Yuxiang Lei UNSW Sydney, Yulei Sui UNSW, Jingling Xue UNSW Sydney DOI
11:40 20m Talk		Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks OOPSLA 2024 Doehyun Baek KAIST, Jakob Getz University of Stuttgart, Yusung Sim KAIST, Daniel Lehmann Google, Germany, Ben L. Titzer Carnegie Mellon University, Sukyoung Ryu KAIST, Michael Pradel University of Stuttgart DOI
12:00 20m Talk		When Your Infrastructure is a Buggy Program: Understanding Faults in Infrastructure as Code Ecosystems OOPSLA 2024 Georgios-Petros Drosos ETH Zurich, Thodoris Sotiropoulos ETH Zurich, Georgios Alexopoulos University of Athens, Dimitris Mitropoulos University of Athens, Zhendong Su ETH Zurich DOI