Towards a High Level Linter for Data Science (NSAD 2024)

Sun 20 - Fri 25 October 2024 Pasadena, California, United States

Who

Greta Dolcetti, Agostino Cortesi, Caterina Urban, Enea Zaffanella

Track

NSAD 2024

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 22 Oct 2024 10:00 - 10:30 at Pacific B - NSAD: Session 1 Chair(s): Vincenzo Arceri, Michele Pasqua

Abstract

Due to its interdisciplinary nature, the development of data science code is subject to a wide range of potential mistakes that can easily compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low level programming issues. We discuss the steps needed to implement a tool that is rather meant to focus on higher level errors that are specific of the data science pipeline. To this end, we propose a static analysis assigning ad hoc abstract datatypes to the program variables, which are then checked for consistency when calling functions defined in data science libraries. By adopting a descriptive (rather than prescriptive) abstract type system, we obtain a linter tool reporting data science related code smells. While being still work in progress, the current prototype is able to identify and report the code smells contained in several examples of questionable data science code.

DOI

https://doi.org/10.1145/3689609.3689996

Greta Dolcetti

Ca’ Foscari University of Venice

Italy

Agostino Cortesi

Ca’ Foscari University of Venice

Italy

Caterina Urban

Inria - École Normale Supérieure

France

Enea Zaffanella

University of Parma

Italy

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 22 Oct
Displayed time zone: Pacific Time (US & Canada) change

09:00 - 10:30	NSAD: Session 1NSAD at Pacific B Chair(s): Vincenzo Arceri University of Parma, Italy, Michele Pasqua University of Verona

09:00 5m		Opening NSAD Vincenzo Arceri University of Parma, Italy, Michele Pasqua University of Verona
09:05 55m Keynote		Abstract Domains for Machine Learning VerificationKeynote NSAD Caterina Urban Inria - École Normale Supérieure DOI
10:00 30m Full-paper		Towards a High Level Linter for Data ScienceFull Paper NSAD Greta Dolcetti Ca’ Foscari University of Venice, Agostino Cortesi Ca’ Foscari University of Venice, Caterina Urban Inria - École Normale Supérieure, Enea Zaffanella University of Parma DOI