The Hall of Fame Award

Nomination Deadline: July 31st, 2024

The SIGOPS Hall of Fame Award was instituted in 2005 to recognize the most influential Operating Systems papers that were published at least ten years in the past. Starting in 2016, SIGOPS will select 1-2 papers annually, restricted to papers that appeared, in any form of publication, at least 10 years previously—specifically, before 30 September in year (X-9), where X is the selection year.

The selection committee for these awards is made up of the program chairs/co-chairs of the SOSP/OSDI conferences which were held within that 10-11 year timeframe. In case of conflict of interests, program chairs from earlier SOSP/OSDI conferences help make the selection.

The decision will be based on a discussion that considers the impact the paper (and more generally of the research described in the paper) has had on the field of operating systems research. The Award committee will prepare a short statement that describes why the paper was selected. For an in-depth discussion of the SIGOPS policy for this award, please read Jeff Mogul’s Policies for the SIGOPS Hall of Fame Award.

The award winners will be announced at the SOSP or OSDI conference by the current program chair. The program chair will read the statement prepared by the Award committee that describes why the paper was selected. The authors of the award-winning paper will be given a plaque, naming the paper, the authors, the conference or journal the paper appeared in, and the conference in which the award was made. This certificate will be signed by the program chair and the current chair of SIGOPS. A list of the winners of the award will be maintained on the SIGOPS website.

2024 Nominations

In 2024, operating systems papers that were published before 30 September 2015 are eligible for nomination. Self-nomination is allowed. The 2024 selection committee consists of Mike Dahlin, Jason Flinn, and Hank Levy

Please submit your nominations using the following form:
https://tinyurl.com/sigopshof2024

Nominations must be received on or prior to July 31st, 2024, in any time zone. You may address questions and comments to the selection committee via HOFnominations@sigops.org.

Winners

2023

  • Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield. For their paper Xen and the art of Virtualization.  In SOSP’03: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. “The Xen paper played a significant role in advancing the field of virtualization by introducing multiple innovative techniques. Its contributions have had a major impact on the design and implementation of modern hypervisors and continue to shape the development of virtualization technologies in both research and industry.”

2022

    • James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, Google, Inc. For their paper Spanner: Google’s Globally-Distributed Database. In OSDI’12: Proceedings of the 10th USENIX conference on Operating systems design and implementation. “Spanner showed how to balance the requirements of consistency, availability, and partition tolerance for a globally replicated transactional database system. In doing so, Spanner simplified the implementation of a range of higher level services which could then scale across the planet through a familiar and powerful transactional API. The key insights of leveraging clock synchronization using TrueTime and a dedicated, redundant, high-capacity datacenter WAN distinct from the public Internet as the basis for scaling remain fundamental design patterns today.”

2021

    • Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. For their paper: Onix: A Distributed Control Platform for Large-scale Production Networks. In OSDI’10: Proceedings of the 9th USENIX conference on Operating systems design and implementation. “Together with several earlier academic prototypes built by the authors, Onix demonstrated how to build a general purpose, scalable operating system for managing shared network resources, a key element in the widespread commercial success of software defined networking.”

2020

    • William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. For their paper TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones. In OSDI’10: Proceedings of the 9th USENIX conference on Operating systems design and implementation. “This paper was instrumental in demonstrating that taint tracking could be made both efficient and fine-grained. For unmodified smartphone applications, with minimal monitoring overhead, the authors found dozens of potential leaks of sensitive and private information. This work sparked an important research agenda on smartphone privacy that continues to this day.”

    • Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. For their paper The Multikernel: A New OS Architecture for Scalable Multicore Systems. In SOSP ’09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. “This paper synthesized a number of ideas from microkernels to build a highly scalable multiprocessor operating system. Barrelfish motivated the need for designing a single machine operating system as a distributed system, with shared memory message queues for communication between applications, per-core kernels, and scalable, user-level services.”

2019

    • Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt. Debugging in the (Very) Large: Ten Years of Implementation and Experience. In SOSP ’09 Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. “This paper describes the successful decade-long effort to dramatically improve the security and reliability of Windows by collecting and analyzing data on every crash experienced by every PC anywhere in the world.  Although we now take it for granted that deployed software should always “phone home” error reports, the work described in this paper was the first to do so at global scale.”

    • Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: formal verification of an OS kernel  In SOSP ’09 Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. “The seL4 project was the first to provide a machine-checked proof of correctness and security properties of a high-performance microkernel. The authors used a unique approach that fuses formal and operating systems techniques, resulting in a general purpose operating system kernel that performs as well as a state-of-the-art microkernel and whose behavior can be precisely predicted for any input. The work has become the basis for a large amount of subsequent work in provably correct systems.”

2018

    • Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In OSDI’08: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 2008. DryadLINQ demonstrated how to provide declarative operations on top of a distributed dataflow system, making it convenient for programmers to compute on large data sets using relational operators while providing high-performance.  Its techniques include translating data-parallel operations into a distributed execution plan for the Dryad data flow system. The paper demonstrated its effectiveness by showing results from Dryad in production on clusters with thousands of computers.

    • Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In OSDI’08: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 2008. KLEE shows how to scale symbolic execution to generating automatic tests that achieve high coverage and applying it to widely-used software, including the GNU COREUTILS utility programs.  The authors demonstrated KLEE’s effectiveness on heavily-used programs by finding bugs that manual testing has missed. The ideas and results have been an inspiration for many subsequent work on automatic bug finding.

2017

    • Mike Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006), 335-350. The Chubby lock service provides coarse-grained locking and reliable, low-volume storage for a loosely-coupled distributed system, and is particularly useful for synchronizing activities between clients. Chubby uses Paxos internally, but exposes a lock-service API to its clients, intended to simplify its adoption by programmers. The paper was one of the first to discuss the challenges of engineering a high-availability service for use by a wide range of programmers in a globally-distributed environment. While Chubby itself is widely used only within Google, the paper inspired open-source implementations of similar services, such as Zookeeper, that provide similar functionality.

    • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s Highly Available Key-value Store. In Proceedings of the 21st ACM  Symposium on Operating Systems Principles (SOSP 2007), 205-220. Dynamo is a scalable and highly reliable distributed key-value store. The paper describes how Dynamo manages the tradeoffs between availability, consistency, cost-effectiveness, and performance, and explains how the system combines a variety of techniques: consistent hashing, vector clocks, sloppy quorums, Merkle trees, and gossip-based membership and failure detection protocols. In particular, the paper emphasizes the value of supporting eventual consistency in order to provide high availability in a distributed system. Dynamo evolved within Amazon to become the basis of a popular cloud service, and also inspired open-source systems such as Cassandra.

2016

    • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’06), November 2006.

2015

    • R. A. Meyer and L. H. Seawright. A virtual machine time-sharing system. IBM Systems Journal 9(3), September 1970, 199-218. This paper described the second generation of the very first virtual machine system. It was originally built in 1966 for an IBM 360/40 with custom virtual memory hardware and then ported to a 360/67, which had virtual memory built in. In addition to a virtual machine monitor called CP, the system included a single user interactive system called CMS, heavily influenced by MIT’s CTSS; to support multiple users the system ran CMS in a separate VM for each user. Because of the clean architecture of the 360, CP could virtualize the hardware perfectly (except for timing dependencies and self-modifying channel programs) without binary translation, though it did have to translate the channel programs. It could run most of the existing IBM operating systems in virtual machines. CP/67 evolved into VM/370, which became the main time-sharing system for IBM mainframes.

    • David. L. Parnas. On the criteria to be used in decomposing systems into modules. Communications of the ACM 15(12), December 1972, 1053-1058. This paper introduced a technique for decomposing a complex system into modules. Through a simple example it showed that a modularization that emphasizes what is now known as “information hiding” is superior to more obvious module decompositions in terms of the software engineering lifecycle. The paper argues the beneficial decomposition can be achieved with minimal performance overheads. The “information hiding” approach has influenced software engineering in areas including operating systems, distributed systems, databases, and programming languages.

    • H. T. Kung and John T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS) 6(2), June 1981, 213-226. This paper introduced the notion of optimistic concurrency control, proceeding with a transaction without locking the data items accessed, in the expectation that the transaction’s accesses will not conflict with those of other transactions. This idea, originally introduced in the context of conventional databases, has proven very powerful when transactions are applied to general-purpose systems.

    • Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry A fast file system for UNIX. ACM Transactions on Computer Systems (TOCS) 2(3), August 1984, 181-197. This paper introduced techniques to make the file system “disk aware”, thus demonstrating the importance of understanding the interplay between hardware technology and file-system design. The structuring concept of the cylinder group, while simple, is found in some form in many current systems (including the widely-deployed Linux ext* family) and serves as an excellent example of the importance of locality in storage. The paper also introduced numerous functionality improvements, including symbolic links and atomic rename, which have since become commonplace features in modern file systems.

    • James J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transactions on Computer Systems (TOCS) 10(1), February 1992, 3-25. This paper was the first to describe the use of caching to provide availability in addition to improved performance in a distributed setting where clients use files stored at remote file servers, leading to potential loss of service during disconnection. The Coda design provided a thoughtful and elegant approach to supporting continued service during disconnection. Disconnected clients continued to service user requests using locally cached content; however all potential modifications performed while disconnected were logged locally, and when service was restored the system attempted to reconcile the local modifications with the current server state. The Coda design inspired much follow-on research on distributed file systems and its techniques were adopted in other systems.

    • Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th annual international symposium on computer architecture (ISCA ’93), May 1993, 289-300. This paper introduced transactional memory, an architectural concept intended to make lock-free synchronization as efficient and easy to use as conventional techniques based on mutual exclusion. This concept has found its way into commercial multicore processors, and has generated a large amount of follow-on work in software transactional memory.

    • Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. In Proceedings of the 14th ACM Symposium on Operating Systems Principles (SOSP ’93). December 1993, 203-216. This paper demonstrated that compiler or code-rewriting techniques could isolate untrusted code modules, preventing them from writing or jumping to addresses outside their “fault domain”, without the overhead of crossing hardware-enforced address space boundaries, and without much increase in execution time of code within a domain. The paper inspired substantial subsequent research, and the basic techniques have been implemented in widely-deployed software, such as Web browsers.

    • D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the 15th ACM symposium on Operating Systems Principles (SOSP ’95), December 1995, 172-182. Bayou is a replicated storage system that anticipated the world of numerous small mobile devices executing collaborative applications over unreliable networks. The paper describes a client-server storage structure supporting eventual consistency, anti-entropy protocols, disconnected operation, log-based recovery, and an application-centered approach to detecting and resolving update conflicts to arrive at consistent replicas. These concepts were backed up by a prototype implementation, two applications, and a simple performance evaluation. Bayou is still relevant to the problems faced by, and the solutions employed by, a large number of today’s modern applications.

    • J. Liedtke. On micro-kernel construction. In Proceedings of the 15th ACM symposium on Operating Systems Principles (SOSP ’95), December 1995, 237-250. This paper presented the core design ideas behind the L4 microkernel, especially the minimality principle, which states that functionality must only be implemented inside the kernel if moving it outside would prevent the implementation of required system functionality. This principle was at the heart of L4’s design, and supported a ruthless performance focus, which allowed L4 to outperform other microkernels by an order of magnitude. The core ideas of this paper led to a family of L4 microkernels which were commercially deployed on a large scale, and eventually enabled unprecedented assurance through formal verification.

    • Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM ’01), 2001, 149-160. This paper introduced a novel protocol that enables efficient key lookup in a large-scale and dynamic environment; the paper shows how to utilize consistent hashing to achieve provable correctness and performance properties while maintaining a simplicity and elegance of design. The core ideas within this paper have had a tremendous impact both upon subsequent academic work as well as upon industry, where numerous popular key-value storage systems employ similar techniques. The ability to scale while gracefully handling node addition and deletion remains an essential property required by many systems today.

    • Carl A. Waldspurger. Memory resource management in VMware ESX server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI ’02), December 2002, 181-194. This paper introduced elegant and effective techniques of hypervisor memory management. Memory ballooning allows the hypervisor to reclaim memory from a virtual machine in accordance with the unmodified guest’s operating system policies. Transparent page sharing supports efficient memory use with small overhead. The combination of active memory estimation, idle memory tax, and proportional fair sharing, along with admission-controlled memory reservation,provides the basis for service level agreements and reasoned overcommitment. This paper has been highly influential; many of its techniques have been adopted by widely-used hypervisors.

    • George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI ’02), 2002, 211-224. The paper demonstrated that the execution of an arbitrary program inside a virtual machine can be replayed deterministically and efficiently. Originally intended primarily as a tool for intrusion analysis, record-and-replay has been used subsequently for debugging, fault-tolerance, to audit program executions, and other virtual machine services. The work has directly influenced commercial products and sparked a research area that continues to this day.

    • Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03). 2003, 29-43. This paper presented an effective design for a large-scale distributed file system that provided fault tolerance while running on inexpensive commodity hardware. It provided very large amounts of I/O bandwidth by having all data transfers happen directly between client processes and machines storing the actual data, handled automatic recovery from failed disks and machines, and retained the simplicity of managing file system metadata in a single centralized master. GFS formed the basis for the design for the open-source HDFS system, as well the backbone for the evolution of large-scale distributed file systems at Google and elsewhere.

    • Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation (OSDI’04), 2004, USENIX Association. The paper proposed a simple yet highly effective approach for processing large data sets in a scalable and fault-tolerant manner. An impressive aspect of the design is its simplicity: it elegantly captures a common pattern that solves two critical problems faced by many developers today (scalability and fault tolerance), while still retaining a clean, easy-to-use interface that supports a wide range of applications. The impact of MapReduce has been huge. It is widely used in industry, with virtually every large company running MapReduce. As a sign of great system design, developers have adopted MapReduce in many use cases beyond its original goals and inspired many follow-on systems.

2013

    • Daniel G. Bobrow, Jerry D. Burchfiel, Daniel L. Murphy and Raymond S. Tomlinson. Tenex, A Paged Time Sharing System for the PDP-10 Communications of the ACM 15(3), March 1972.The Tenex system pioneered many ideas that are prominent in modern operating systems. It included one of the first page based memory systems, copy on write sharing, mapping of files into virtual memory, and user/group/other file protection. It also had mnemonic commands with command-line completion and automatic file versioning. As one reviewer said, “Reading it now, I’m pleasantly surprised by how much is familiar — thanks to its successors.”

    • Joel Bartlett. A NonStop Kernel in Proceedings of the Eighth ACM Symposium on Operating Systems Principles (SOSP ’81), Pacific Grove, California, December 1981.Tandem was the first commercial database to achieve fault tolerance. To accomplish this, the Tandem system had to bring together many techniques — including message-passing, mirroring, fast failure detection, and failover — into a practical design and implementation.

    • K. Mani Chandy and Leslie Lamport. Distributed Snapshots: Determining Global States of a Distributed System ACM Transactions on Computer Systems 3(1), February 1985.This paper takes the idea of consistency for distributed predicate evaluation, formalizes it, distinguishes between stable and dynamic predicates, and shows precise conditions for correct detection of stable conditions. The fundamental techniques in the paper are the secret sauce in many distributed algorithms for deadlock detection, termination detection, consistent checkpointing for fault tolerance, global predicate detection for debugging and monitoring, and distributed simulation.

    • Kenneth P. Birman and Thomas A. Joseph. Exploiting Virtual Synchrony in Distributed Systems in Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP ’87), Austin, Texas, November 1987.This paper describes a methodology for building distributed applications comprised of multiple components, each realized by a group of replicated servers. It defines a number of group communication primitives and then ties fault notification into the fabric of group services by introducing the virtual synchrony principle, which orders communication and fault notifications consistently among group members and across multiple groups.

    • Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti and Frans Kaashoek. The Click Modular Router ACM Transactions on Computer Systems (TOCS), 18(3), August 2000.Click defines a simple, modular, and efficient framework for constructing network routers with different services and properties. Since this paper’s publication, Click has been an essential tool for the networking and systems research communities with dozens and perhaps hundreds of systems and papers built on it, including several commercially successful systems.

2012

    • Brian M. Oki, Barbara H. Liskov. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing (PODC 1988), Toronto, ON, Canada, Aug 1988, pp 8–17.The paper introduces a replication protocol very similar to what is now known as Paxos. That protocol has become the standard for consistent, fault-tolerant state-machine replication, and is widely used in data centers to keep the state consistent despite failures and reconfiguration.

    • Leslie Lamport. The Part Time Parliament ACM TOCS 16(2), May 1998, 133–169.The work (originally published in 1989) was independent and roughly concurrent with the Viewstamped Replication work also recognized this year. It describes the protocol in a more general setting, adds a correctness argument, and forms the basis for modern Paxos implementations.

    • Kai Li, Paul Hudak. Memory Coherence in Shared Virtual Memory Systems ACM TOCS 7(4), Nov 1989, pp 321–359.The paper shows how to simulate coherent shared memory on a cluster, and also introduces directory-based distributed cache-coherence. It spawned a entire research area, and introduced cache coherence mechanisms that are widely used in industry.

    • Mendel Rosenblum, John K. Ousterhout. The Design and Implementation of a Log-Structured File System ACM TOCS 10(1), Feb 1992, pp 26–52.The paper introduces log-structured file storage, where data is written sequentially to a log and continuously de-fragmented. The underlying ideas have influenced many modern file and storage systems like NetApp’s WAFL file systems, Facebook’s picture store, aspects of Google’s BigTable, and the Flash translation layers found in SSDs.

2011

    • Jim Gray. Why Do Computers Stop And What Can Be Done About It? HP Labs Technical Report TR-85.7.The paper presents the first large scale quantitative study of computer failures in practice, of a system built using best practices at the time to achieve fault-tolerance.

    • Ken Thompson. Reflections on Trusting Trust. Communications of the ACM, Volume 27 Issue 8, Aug 1984.The paper demonstrated that to have trust in a program, one cannot just rely on trust in the person who wrote it, or even on verifying the source code. One must also ensure that the entire tool chain used to produce and execute binaries is trustworthy.

    • David A. Patterson, Garth Gibson, Randy H. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data.The paper shows how to achieve efficient, fault tolerant and highly available storage using cheap, unreliable components.

2010

2009

    • Cary G. Gray and David R. Cheriton, Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency, Proceedings of the Twelfth ACM Symposium on Operating Systems Priciples (SOSP), December 1989, Litchfield Park, AZ, USA.The Gray and Cheriton paper pioneered through its analysis of the Leases mechanism, which has become one of the most widely-used mechanisms for managing distributed caches. The paper is particularly striking for its careful analysis of the semantics of leases, its detailed experiments, and its thoughtful discussion of fault-tolerance issues.

    • Butler W. Lampson and David D. Redell, Experience with processes and monitors in Mesa, Proceedings of the Seventh ACM Symposium on Operating Systems Priciples (SOSP), December 1979, Pacific Grove, CA, USA.When this paper was written, monitors had emerged as the synchronization method of choice. in programming languages conferences and operating systems textbooks. This paper was the first to look closely at the practical issues that monitors pose when used in a large production system. These issues remain contemporary, and indeed researchers working on transactional memory mechanisms would do well to reread this wonderful paper.

    • Nancy P. Kronenberg, Henry M. Levy, and William D. Strecker, VAXclusters: A Closely-Coupled Distributed System, Proceedings of the Tenth AMC Symposium on Operating Systems Principles (SOSP), December 1985, Orcas Island, USA, USA.The VAX Clusters system was the first modern clustered system supporting such basic features as a distributed file system and a distributed locking service. The SOSP paper on VAX Clusters remains a classic today. VAXclusters was a huge commercial success, and set the stage for today.s massive data centers.

2008

    • John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West, Scale and Performance in a Distributed File System, Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP), November 1987, Austin TX, USA.

    • Andrew D. Birrell, Roy Levin, Roger M. Needham, and Michael D. Schroeder, Grapevine: An Exercise in Distributed Computing, Proceedings of the Eighth ACM Symposium on Operating Systems Principles (SOSP), December 1981, Pacific Grove CA, USA.

2007

    • Leslie Lamport, Time, Clocks, and the Ordering of Events in a Distributed System, Communications of the ACM 21(7):558-565, July 1978.Perhaps the first true “distributed systems” paper, it introduced the concept of “causal ordering”, which turned out to be useful in many settings. The paper proposed the mechanism it called “logical clocks”, but everyone now calls these “Lamport clocks.”

    • Andrew D. Birrell and Bruce Jay Nelson, Implementing Remote Procedure Calls, ACM Transactions on Computer Systems 2(1):39-59, February 1984.This is the paper on RPC, which has become the standard for remote communication in distributed systems and the Internet. The paper does an excellent job laying out the basic model for RPC and the implementation options.

    • J. H. Saltzer, D. P. Reed, and D. D. Clark, End-To-End Arguments in System Design, ACM Transactions on Computer Systems 2(4):277-288, November 1984.This paper gave system designers, and especially Internet designers, an elegant framework for making sound decisions. A paper that launched a revolution and, ultimately, a religion.

    • Michael Burrows, Martin Abadi, and Roger Needham, A Logic of Authentication, ACM Transactions on Computer Systems 8(1):18-36, February 1990.This paper introduced to the systems community a logic-based notation for authentication protocols to precisely describe certificates, delegations, etc. With this precise description a designer can easily reason whether a protocol is correct or not, and avoid the security flaws that have plagued protocols. “Speaks-for” and “says” are now standard tools for system designers.

2006

    • George C. Necula and Peter Lee, Safe Kernel Extensions Without Run-Time Checking, Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, October 1996, Seattle, WA.This paper introduced the notion of proof carrying code (PCC) and showed how it could be used for ensuring safe execution by kernel extensions without incurring run-time overhead. PCC turns out to be a general approach for relocating trust in a system; trust is gained in a component by trusting a proof checker (and using it to check a proof the component behaves as expected) rather than trusting the component per se. PCC has become one of the cornerstones of language-based security.

2005

    • Edsger W. Dijkstra, The Structure of the THE Multiprogramming System, Proceedings of the First ACM Symposium on Operating Systems Principles, October 1967, Gatlinburg, TN, USA.The first paper to suggest that an operating system be built in a structured way. That structure was a series of layers, each a virtual machine that introduced abstractions built using the functionality of lower layer. The paper stimulated a great deal of subsequent work in building operating systems as structured systems.