Reproducibility is a cornerstone of scientific progress, yet in systems research, it remains a persistent challenge. Over the past five years, the Artifact Evaluation (AE) process at the European Conference on Computer Systems (EuroSys) has aimed to address this challenge head-on. In this blog post, we reflect on what we have learned from organizing AE across five consecutive editions of the conference, share key statistics, and propose a roadmap for the future. An extensive analysis can be found in the full paper that was accepted at ACM REP 2025 [1]. In addition, you can listen to a discussion among some of the authors at an exclusive podcast episode on “Disseminate: The Computer Science Research Podcast” [2].
How Artifact Evaluation Works
At EuroSys, AE is a voluntary process offered to authors of accepted papers. It unfolds in three phases:
- Submission: Authors submit their artifacts and an appendix explaining how to reproduce the main claims of the paper.
- Kick-the-Tires: Reviewers check for basic functionality and flag early issues.
- Evaluation: Reviewers assess the artifact’s completeness, documentation, building process, and ability to reproduce the main claims of the paper.
Each artifact is reviewed by 3–4 evaluators. Authors are able to anonymously interact with the reviewers, and fix any issues that might arise during the process.
The Badges
The AE process of EuroSys awards three badges, as defined by the ACM policies on reproducibility:
- Artifacts Available: The artifact is publicly accessible via a DOI-backed repository.
- Artifacts Evaluated – Functional: The artifact is complete, documented, and works as described.
- Results Validated – Reproduced: The main claims from the paper are independently reproduced.
These badges are evaluated independently and, if awarded, they appear on the first page of the final paper and as metadata on the respective record on the ACM Digital Library.
Statistics from 2021-2025 EuroSys AEs
- 161 artifacts awarded the “Artifacts Available” badge
- 136 artifacts deemed “Functional”
- 75 artifacts awarded the “Results Reproduced” badge
- A steady 58% of accepted papers participating in AE annually
- A growing AE committee, peaking at 98 members in 2025, from institutions mostly across Europe, USA, and Asia
Despite this growth, challenges remain.
Six Persistent Challenges
Tight Deadlines
AE is squeezed between paper notification and camera-ready deadlines, providing a limited timeline for in-depth evaluation.
Lack of interaction between paper and artifact review
AE and paper reviews are disjoint, leading to mismatched expectations about what should be reproduced.
Specialized Hardware or infrastructure
Some artifacts require niche or high-performance hardware, complicating reproducibility.
Short-Term Stewardship
AE chairs and reviewers change yearly, with very few repeating reviewers, leading to a loss of transferable knowledge.
Ensuring long-term artifact availability
Ensuring long-term availability of artifacts is non-trivial, even with DOI-backed platforms.
Imprecise badge definitions
ACM’s badge definitions are descriptive, not prescriptive, potentially leading to inconsistent interpretations even across different editions of the same conference.
What We Propose
Short-Term Improvements
Start Early
Encourage early artifact preparation to ease time pressure and be ready for the evaluation immediately after the paper acceptance notification.
Informal TPC-AEC Channel
Let paper reviewers identify the key results/major claims that the AE must verify and reproduce.
Hardware Availability
Require authors to clearly specify hardware needs for the evaluation process or provide access.
Enforce DOI Use
Mandate DOI-backed storage and restrict deletions to ensure artifact longevity.
Long-Term Vision
Mandatory AE
Require AE for all accepted papers, with opt-outs for special cases, such as papers from the industry or academic papers with expectations for commercial exploitation.
Integrated Review
Tie AE outcomes to paper acceptance, especially for tool-based research.
AE Steering Committee
Establish a Steering Committee of former AE chairs (potentially across conferences) and/or prolong the role of artifact evaluation chairs to more than one year, to maintain best practices and continuity.
Explicit Artifact Commitment in Papers
Require authors to declare which results are reproducible and how at paper submission time.
Looking Beyond EuroSys
Artifact Evaluation is not a silver bullet, but it is a powerful tool for building trust in systems research. With thoughtful design, community engagement, and a commitment to transparency, AE can help ensure that our results are not only publishable, but also reproducible and reusable.
While AE is now standard in systems research, practices vary widely across CS subfields. Some prioritize availability over reproducibility; others, like ML and HPC, rely on separate reproducibility challenges. The EuroSys experience offers a blueprint for system-oriented communities seeking to improve the artifact evaluation process.
References
[1] Lessons Learned from Five Years of Artifact Evaluations at EuroSys. In ACM Conference on Reproducibility and Replicability (ACM REP ’25), July 29–31, 2025, Vancouver, BC, Canada. https://doi.org/10.1145/3736731.3746152
[2] Lessons Learned from Five Years of Artifact Evaluations at EuroSys. In Disseminate: The Computer Science Research Podcast. Hosted by Jack Waudby. Clich here to listen.
*About the authors: This blog post summarizes key insights from a full-length paper co-authored by several of the Artifact Evaluation co-chairs of EuroSys from 2021 to 2025. The authors, listed alphabetically, are Daniele Cono D’Elia, Thaleia Dimitra Doudali, Cristiano Giuffrida, Miguel Matos, Mathias Payer, Solal Pirelli, Georgios Portokalidis, Valerio Schiavoni, Salvatore Signorello, and Anjo Vahldiek-Oberwagner. The blog post was curated by Thaleia Dimitra Doudali.
Disclaimer: This post reflects the personal views and experiences of the Artifact Evaluation chairs at EuroSys. It does not represent an official position of the EuroSys Steering Committee or Officers, nor has it been formally discussed or endorsed by them.