SPLASH 2024 - OOPSLA Artifacts

Publish the software that supports your research!

Authors of accepted, conditionally-accepted, or minor revisions papers are invited to submit a software artifact that supports the claims in their papers. Per the ACM guidelines for Artifact Review and Badging, and starting in 2024, OOPSLA now provides three types of validation for artifacts as badges that appear on the first page of the paper:

Artifact Available: This badge is for artifacts that are published in a permanent location (with a DOI). Artifacts do not need to be evaluated to receive this badge.
Artifact Evaluated: This badge is for artifacts that have been approved by the OOPSLA Artifact Evaluation Committee (AEC). There are two levels for the badge; papers can receive at most one of them:
1. Functional, for artifacts which are found to be documented, consistent, complete, exercisable, and include appropriate evidence of verification and validation
2. Reusable, for artifacts that are Functional and facilitate reuse through careful documentation and clear organization.
Results Validated: Results Reproduced, for artifacts that can be used to replicate the main scientific claims of the paper.

Submission is voluntary. Artifact Evaluation is a service provided by the community to help authors of accepted papers extend the reach of their work and encourage future researchers to build on it.

See the Call for Artifacts tab for more information.

FAQ

Q. My artifact requires hundreds of GB of RAM / hundreds of CPU hours / a specialized GPU / etc., that the AEC members may not have access to. How can we submit an artifact?: If the tool can run on an average modern machine, but may run extremely slow in comparison to the hardware used for the paper's evaluation, document the expected running time and point to examples the AEC may be able to replicate in less time. If your system will simply not work at all without hundreds of GB or RAM, or other hardware requirements that typical machines will not satisfy, please contact the AEC chairs in advance to make arrangements. One option is to get suitable hardware from a cloud provider (for example, Cloudlab), and give reviewers anonymous access. (The AEC chairs will coordinate with reviewers to decide when the cloud reservation needs to be active.) Submissions using cloud instances or similar that are not cleared with the AEC Chairs in advance will be summarily rejected
Q. Can my artifact be accepted if some of the paper’s claims are not supported by the artifact, for example if some benchmarks are omitted or the artifact does not include tools we experimentally compare against in the paper?: In general yes (if good explanations are provided, as explained above), but if such claims are essential to the overall results of the paper, the artifact will be rejected. As an extreme example, an artifact with a working tool but no benchmarks (because they are closed-source) would be rejected. In this case, alternate benchmarks should be provided.
Q. Why do we need a DOI for the Available badge? Why not a Github or institutional URL?: A DOI is a strong assurance that the artifact will remain available indefinitely. By contrast, Github URLs are not permanent: it is possible to rewrite git commit history in a public repository (using rebase and --force, for example), users can delete public repositories, and Github itself might disappear like Google Code did (2015). Institutional URLs may also move and change over time.
Q. Reviewers identified things to fix in documentation or scripts for our artifact, and we'd prefer to publish the fixed version. Can we submit the improved version for the Available badge?: Yes.
Q. Can I get the Available badge without submitting an artifact? I'm still making the thing available!: Yes.
Q. Can I get the Available badge for an artifact that was not judged to be Functional? I'm still making the thing available!: Yes.
Q. Why doesn't the AEC accept paper proofs?: The AEC process is designed for software, not to provide rigorous evaluation of paper proofs. Its main strengths are checking that an artifact runs successfully and has a clear relation to the paper --- neither of which are serious issues for paper proofs. Authors should submit such proofs as supplementary material rather than as artifacts.

Contact

Please contact the AEC chairs Guillaume Baudart and Sankha Narayan Guria if you have any questions.

Call for Artifacts

Publish the software that supports your research!

Artifact Available: This badge is for artifacts that are published in a permanent location (with a DOI). Artifacts do not need to be evaluated to receive this badge.
Artifact Evaluated: This badge is for artifacts that have been approved by the OOPSLA Artifact Evaluation Committee (AEC). There are two levels for the badge; papers can receive at most one of them:
1. Functional, for artifacts which are found to be documented, consistent, complete, exercisable, and include appropriate evidence of verification and validation
2. Reusable, for artifacts that are Functional and facilitate reuse through careful documentation and clear organization.
Results Validated: Results Reproduced, for artifacts that can be used to reproduce the main scientific claims of the paper.

Submission is voluntary. Artifact Evaluation is a service provided by the community to help authors of accepted papers extend the reach of their work and encourage future researchers to build on it.

Artifact Evaluation submission site: https://oopsla24aec.hotcrp.com/

Artifact Available submissions go through the publisher. They are due with the camera-ready materials for the OOPSLA paper. Every artifact that passes evaluation (Functional or Reusable) is strongly encouraged to be Available unless there are licensing or privacy concerns about sharing it.

This Year

OOPSLA AEC will award the Results Reproduced badge in addition to the Functional/Reusable badge.
Authors should add a Hardware Dependencies section to describe the hardware required to evaluate the artifact.
Authors should add a Reusability Guide in the description of the artifact.
We are delighted to promote Software Heritage as a way to host and cite source code.
Communication between authors and AEC members will be open during the entire kick-the-tires period via comments on the submission site. AEC members are encouraged to report issues early so that authors have plenty of time to debug.
Paper proofs are not accepted for evaluation.

Artifact Available

Artifacts that are publicly available in an archival location can earn the Available badge from the publisher. This badge is not controlled by the AEC, which has some important consequences:

Artifacts that were not submitted for evaluation can be Available,
Artifacts that did not pass evaluation can be Available, and
Artifacts that passed evaluation need not be Available to accommodate rare situations in which the authors must keep the artifact private.

The requirements for this badge are set by the publisher and will be provided with the camera-ready instructions for OOPSLA papers. In the past, there have been two primary options for earning the Available badge:

Option 1: Authors upload a snapshot of the artifact to Zenodo to receive a DOI. Uploads can be done manually, or through GitHub.
Option 2: Authors work with Conference Publishing to send their artifact to the ACM for hosting on the ACM DL.

Data-Availability Statement

To help readers find data and software, OOPSLA recommends adding a section just before the references titled Data-Availability Statement. If the paper has an artifact, cite it here. If there is no artifact, this section can explain how to obtain relevant code. The statement does not count toward the OOPSLA 2024 page limit. It may be included in the submitted paper; in fact we encourage this, even if the DOI is not ready yet.

Example:

\section{Conclusion}
....

\section*{Data-Availability Statement}
The software that supports~\cref{s:design,s:evaluation}
is available on Software Heritage~\cite{artifact-swh}
and Zenodo~\cite{artifact-doi}.

\begin{acks}
....

Software Heritage

Software Heritage (SH) is a nonprofit whose mission is to collect, preserve, and share all public code. For authors of OOPSLA papers, SH offers three major services:

Permanent links to directories, files, and code fragments
BibLaTeX styles for citing software
Automatic crawling of source code repositories for updates

For more information, read the Software Heritage HOWTO and FAQ guides. See also the browser extension and GitHub action for archiving code.

Two caveats: (1) the ACM does not yet accept SH permalinks for the Available badge, only DOIs; and (2) we recommend packaging source code artifacts with Docker (or a similar build tool) to avoid dependency issues.

Artifact Evaluated

There are two levels for the Artifact Evaluated badge: Functional and Reusable.

Functional: This is the basic “accepted” outcome for an artifact. An artifact can be awarded a Functional badge if the artifact is:

Documented: At minimum, an inventory of artifacts is included, and sufficient description provided to enable the artifacts to be exercised.
Consistent: The artifacts are relevant to the associated paper, and contribute in some inherent way to the generation of its main results.
Complete: To the extent possible, all components relevant to the paper in question are included.
Exercisable: Included scripts and/or software used to generate the results in the associated paper can be successfully executed, and included data can be accessed and appropriately manipulated.

Reusable: The goal of the Reusable badge is to recognize high-quality artifacts which support other people in understanding, applying, and extending the artifact. The criteria for the Reusable badge are less cut-and-dry than for the Functional badge. Reusability depends on the type of artifact and the community’s quality standards for research code. In addition to the Functional badge requirements, Reusable artifacts should ideally:

clearly explain the capabilities of the artifact (e.g., with examples),
contain high-quality documented code (e.g., READMEs, comments, tests),
document how to adapt the artifact to new inputs, and
be packaged to enable their reuse as a component in another project.

Not all parts of an artifact must be evaluated for reusability. For example, a shell script that generates graphs for the paper may not be intended for reuse, only for reproducing the results. In the artifact Overview, the Reusability Guide should explain which parts of the artifact are reusable and how (see below).

Results Validated

There exist two levels for the Results Validated badge: Results Reproduced, and Results Replicated, but OOPSLA AEC will only award Results Reproduced badges (to obtain the Results Replicated badge, the results of the paper must be obtained without the use of author-supplied artifacts).

Results Reproduced: The Results Reproduced badge is awarded if the AEC can replicate all claims presented in the paper using the artifact, possibly excluding some minor claims if there are very good reasons why they cannot be supported.

In the ideal case, an artifact with this designation includes all relevant code, dependencies, input data (e.g., benchmarks), and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the paper. If the artifact claims to outperform a related system in some way (in time, accuracy, etc.) and the other system was used to generate new numbers for the paper (e.g., an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system, and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this as the expected behavior.

Deviations from the ideal must be for good reason. A non-exclusive list of justifiable deviations follows:

Some benchmark code is subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (e.g., licensed benchmark suites like SPEC, or when a tool is applied to private proprietary code). In such cases, the public benchmarks should be included. If all benchmark data for a major claim is private, alternative data should be supplied. Providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
Some of the results are performance data, and therefore exact numbers depend on the particular hardware. In this case, artifacts should explain how to recognize when experiments on other hardware reproduce the high-level results. For example, certain optimizations might exhibit a particular trend, or one tool might outperform another in a certain class of inputs.
Repeating the evaluation takes a very long time. If so, provide small and representative inputs to demonstrate the behavior. Reviewers may or may not reproduce the full results in such cases.
The evaluation requires specialized hardware (e.g., a CPU with a particular new feature, or a specific class of GPU, or a cluster of GPUs). Authors should provide instructions on how to gain access to the hardware or contact the chairs as soon as possible to work out how to make these possible to evaluate. In past years, one outcome was that an artifact requiring specialized hardware paid for a cloud instance with the hardware, which reviewers could access anonymously.

Selection Criteria

The artifact is evaluated in relation to the expectations set by the paper. For an artifact to be accepted, it must support the main claims made in the paper. Thus, in addition to just running the artifact, the evaluators will read the paper and may try to tweak provided inputs or otherwise slightly generalize the use of the artifact from the paper in order to test the artifact’s limits. In general, artifacts should be:

consistent with the paper,
as complete as possible,
well documented, and
easy to reuse, facilitating further research.

The AEC strives to place itself in the shoes of such future researchers and then to ask: how would this artifact help me to reproduce the results and build on them?

Submission Process

All conditionally-accepted OOPSLA papers are eligible to submit artifacts.

Submissions require three parts:

an overview of the artifact,
a non-institutional URL pointing to either:
- a single file containing the artifact (recommended), or
- the address of a public source control repository
A hash certifying the version of the artifact at submission time: either
- an md5 hash of the single file (use the md5 or md5sum command-line tool to generate the hash), or
- the full commit hash for the repository (e.g., from git reflog --no-abbrev)

The URL must be non-institutional to protect the anonymity of reviewers. Acceptable URLs can be obtained from Google Drive, Dropbox, Gitlab, Zenodo, and many other providers. You may upload your artifact directly if it is a single file less than 15 MB.

Artifacts do not need to be anonymous. Reviewers will be aware of author identities.

Overview of the Artifact

The overview should consist of five parts:

a brief Introduction,
a Hardware Dependencies section,
a Getting Started Guide, and
Step-by-Step Instructions for how you propose to evaluate the functionality of your artifact (with appropriate connections to the relevant sections of your paper).
a Reusability Guide for how you propose to evaluate the reusability of your artifact.

In the Introduction, briefly explain the purpose of the artifact and how it supports the paper. We recommend listing all claims in the paper and stating whether or not each is supported. For supported claims, say how the artifact provides support. For unsupported claims, explain why they are omitted.

In the Hardware Dependencies section, describe the hardware required to evaluate the artifact. If the artifact requires specific hardware (e.g., many cores, disk space, GPUs, specific processors), please provide instructions on how to gain access to the hardware. Keep in mind that reviewers must remain anonymous.

In the Getting Started Guide, give instructions for setup and basic testing. List any software requirements and/or passwords needed to access the artifact. The instructions should take roughly 30 minutes to complete. Reviewers will follow the guide during an initial kick-the-tires phase and report issues as they arise.

The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

In the Step by Step Instructions, explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out, note how long it is expected to run (roughly) and explain how to run it on smaller inputs. Reviewers may choose to run on smaller inputs or larger inputs depending on available resources.

Be sure to explain the expected outputs produced by the Step by Step Instructions. State where to find the outputs and how to interpret them relative to the paper. If there are any expected warnings or error messages, explain those as well. Ideally, artifacts should include sample outputs and logs for comparison.

In the Reusability Guide, explain which parts of your artifact constitute the core pieces which should be evaluated for reusability. Explain how to adapt the artifact to new inputs or new use cases. Provide instructions for how to find/generate/read documentation about the core artifact. Articulate any limitations to the artifact’s reusability.

If the Artifact Overview is written in a language like Markdown or TeX, then authors are encouraged to upload the rendered output to HotCRP and include the source files in the artifact.

Packaging the Artifact

When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members have a limited time in which to make an assessment of each artifact.

A good way to package artifacts is as a virtual machine (VM). VMs give an easily reproducible environment that is somewhat resistant to bit rot. They also give reviewers confidence that errors or other problems cannot cause harm to their machines. We recommend that you provide VMs built for both x86 and ARM so reviewers can run the VM on whichever architecture matches their personal computer.
Source code artifacts should use Docker or another build tool to manage all compilation and dependencies. This improves the odds that the reviewers will be able to quickly and painlessly install the artifact — without getting lost in environment issues (e.g. what Python do I need?!).
Mechanized proof artifacts should follow the guidelines on this page: Proof Artifacts (accessed 2023-10-02). Be sure to explain how the mechanization encodes concepts and theorems from the paper. In our experience, it is difficult for a mechanized artifact to satisfy the requirements for Functional alone without also being Reusable because documentation is crucial for understanding whether the artifact faithfully supports the paper.

Submit your artifact as a single archive file and use the naming convention <paper #>.<suffix>, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents (such as .txt, .html, and .pdf).

Based on the outcome of the previous editions (2019 AEC, 2020 AEC, 2021 AEC, 2023 AEC), the strongest recommendation we can give for ensuring quality packaging is to test your own directions on a fresh machine (or VM), following exactly the directions you have prepared.

While publicly available artifacts are often easier to review, and considered to be in the best interest of open science, artifacts are not required to be public and/or open source. The submission site will ask whether the artifact is private. Artifact reviewers will be instructed that such artifacts are for use only for artifact evaluation, that submitted versions of artifacts may not be made public by reviewers, and that copies of artifacts must not be kept beyond the review period.

Review Process Overview

Kick-the-tires

After submitting their artifact, there is a short window of time in which the reviewers will work through only the Getting Started instructions, and upload preliminary reviews indicating whether or not they were able to get those 30-or-so minutes of instructions working. The preliminary reviews will be shared with authors immediately, who may make modest updates and corrections in order to resolve any issues the reviewers encountered.

Additional rounds of interaction are allowed via comments throughout the initial kick-the-tires period. Our goal here is twofold: we want to give authors the opportunity to resolve issues early (before other reviewers rediscover them), and we want authors to have as much time as possible for debugging (more than the typical 3-day response window).

Full review

During the full review period, comments are closed by default but may be reopened at reviewers’ discretion to debug small issues. The purpose of re-opening communication is to maximize the number of Functional submissions.

COI

Conflict of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the two AEC chairs are handled by the other AEC chair, or the PC of the conference if both chairs are conflicted. Artifacts involving an AEC chair must be unambiguously accepted (they may not be borderline), and they may not be considered for the distinguished artifact award.

Common issues

In the kick-the-tires phase

Overstating platform support. Several artifacts claiming the need for only UNIX-like systems failed severely under macOS — in particular those requiring 32-bit compilers, which are no longer present in newer macOS versions. We recommend future artifacts scope their claimed support more narrowly.
Missing dependencies, or poor documentation of dependencies. The most effective way to avoid these sorts of issues ahead of time is to run the instructions independently on a fresh machine, VM, or Docker container.

In the full review phase

Comparing against existing tools on new benchmarks, but not including ways to reproduce the other tools’ executions.
Not explaining how to interpret results. Several artifacts ran successfully and produced the output that was the basis for the paper, but without any way for reviewers to compare these for consistency with the paper. Examples included generating a list of warnings without documenting which were true vs. false positives, and generating large tables of numbers that were presented graphically in the paper without providing a way to generate analogous visualizations. We recommend to use scripts to generate data and experimental figures. This fully automated approach may be a bit more costly to setup, but you won’t have any copy/pasting issue for your paper, and regenerating data is heavily simplified.

Contact

Please contact the AEC chairs Guillaume Baudart and Sankha Narayan Guria if you have any questions.

This year, for the first time, following the ACM guidelines for Artifact Review and Badging, the AEC awarded two kinds of badges: Artifact Evaluated (Functional or Reusable), and Results Reproduced. Functional or Reusable badges can thus be awarded to functional and well-packaged artifacts even when not all claims from the paper can be reproduced. The Result Reproduced badge was awarded if reviewers were able to reproduced 100% of the paper claims and evaluation during the review period.

The AEC evaluated 113 artifacts, 40 in round 1 (35%) and 74 in round 2 (65%). This is significantly higher than previous years. Reviewers completed in average 6 reviews over the 2 rounds.

66 artifacts received the Reusable badge (59%), 33 artifacts received the Functional badge (29%), and 14 artifacts did not receive any badge (12%). 79 artifacts received the Results Reproduced badges (80% of the functional artifacts).

The most common issues in artifacts without a Results Reproduced badge were: inconsistencies with respect to the paper that are not documented; significant differences between the outputs of the evaluation and the figures presented in the paper; missing claims; or not enough resources to reproduce all the claims. We strongly encourage authors to provide anonymized remote access to specialized hardware if the artifact requires large computing resources.

The most common problem in artifact without a Reusable badge was a lack of documentation. In particular, reviewers looked for examples and instructions on how to run the artifact on new inputs; details on architecture limitations; how to extend the artifact to build further research tools. We strongly encourage authors to consider the usability of their artifact in environments or with inputs beyond the evaluation suite, and to provide documentation that will assist reviewers and future users in these contexts.

Distinguished Artifacts

Deriving Dependently-Typed OOP from First Principles
- David Binder, Ingo Skupin, Tim Süberkrüb, Klaus Ostermann
Sensitivity by Parametricity
- Elisabet Lobo-Vesga, Carlos Tomé Cortiñas, Alejandro Russo, Marco Gaboardi
On the Expressive Power of Languages for Static Variability
- Paul Maximilian Bittner, Alexander Schultheiß, Benjamin Moosherr, Jeffrey M. Young, Leopoldo Teixeira, Eric Walkingshaw, Parisa Ataei, Thomas Thüm
Making Formulog Fast: An Argument for Unconventional Datalog Evaluation
- Aaron Bembenek, Michael Greenberg, Stephen Chong

Distinguished Artifact Reviewers

William Bowman (University of British Columbia)
Qinlin Chen (Nanjing University)
Thomas Haas (TU Braunschweig)
Brent Pappas (University of Central Florida)
Neea Rusch (Augusta University)

OOPSLA ArtifactsSPLASH 2024

Call for Artifacts

Chair's Report

Guillaume BaudartCo-chair

Inria

France

Sankha Narayan GuriaCo-chair

University of Kansas

United States

Jaime Arias

CNRS; LIPN; Université Sorbonne Paris Nord

France

Aurèle Barrière

EPFL

Switzerland

Thomas Bourgeat

EPFL

Switzerland

William J. Bowman

University of British Columbia

Canada

Yuandao Cai

Hong Kong University of Science and Technology

China

Abha Chaudhary

Binghamton University

United States

Hongzheng Chen

Cornell University

United States

Qinlin Chen

Nanjing University

China

Ellie Y. Cheng

MIT

United States

Donovan Crichton

The Australian National University

Australia

Will Crichton

Brown University

United States

Robert Dickerson

Purdue University

United States

Elizabeth Dinella

Bryn Mawr College

Pierre Donat-Bouillud

Czech Technical University in Prague

Czechia

Justine Frank

University of Maryland, College Park

Hugo Férée

University of Kent, UK

United Kingdom

Lourdes del Carmen González-Huesca

National Autonomous University of Mexico

Mexico

Matt Griffin

University of Surrey

United Kingdom

Thomas Haas

TU Braunschweig

Germany

Andrew Habib

ABB Corporate Research, Germany

Germany

Hans-Dieter Hiep

Amazon Web Services

United Kingdom

Kesha Hietala

Amazon Web Services

United States

Nick Hu

University of Oxford

United Kingdom

Junyoung Jang

McGill University

Canada

Pankaj Kumar Kalita