Summarized by Chris Diaz (Univ. of Kentucky) and Neal Cardwell (Univ. of Washington)
Barbara Liskov, Jeff Mogul, and Fred Schneider selected the WIP talks from hardcopy abstracts submitted on-site. They ran a typically fast-paced but smoother-than-usual session, carefully pipelining actual speaking with the mechanics of visual aid prep. Thus, no speaker was "gonged" for exceeding their big six minutes.
Marc Shapiro (INRIA Rocquencourt)
Shapiro discussed the PerDiS system, which supports sharing across a wide area network. The system supports many applications, though the focus is on applications of concurrent engineering such as that used in building and construction. Such applications require shared mutable, or writable, data for users to cooperate and exchange information. In PerDiS, a user may obtain and cache a copy of shared data, which will not be changed by the write of another user. The server accepts a client's changes if no other write has occurred to the data in the meantime. Otherwise the server rejects the changes, and the application can recover. Such rejections occur rarely in practice, as engineers often agree upon such actions.
The PerDiS platform automatically manages both distribution and persistence, which is a big win for applications. A significant lesson is that the coarse granularity is essential to making distributed sharing manageable: failures can only occur at transaction and file boundaries. Performance is better than fine-grain systems thanks to much fewer messages. Also, the DSM abstraction makes the system very easy to understand and to use. The negative lessons are that much effort is needed to build a significant large-scale system, and that language integration is important for supporting legacy applications.
PerDiS http://www.perdis.esprit.ec.org/ is available as open source.
William J. Bolosky, John R. Doucher, Marvin Theimer (Microsoft Research)
A symbiotic distributed file system consists of multiple independent file systems that work together without trust. For example, the components in such an environment may consist of stock traders. While each depends on the others for generating prices and trading shares, each distrusts the other and does not disclose information that reveals one's strategy. The goal of this work is to design and build a serverless, distributed file system whose nodes do not trust the other nodes.
The authors make three observations. First, system vulnerability is inversely proportional to the number of nodes that store data. That is, a completely centralized system is more vulnerable to attack than a decentralized one, because an attacker only needs to compromise one component to bring down a centralized system. Second, client resources are numerous, thus leading to decentralized data storage. Third, "bigger is better": it is easier to "hide" replicas of data from a malicious attacker in a large system with many nodes than in a small system with only a few nodes. Stated somewhat differently, it is easier to hide a needle in a large haystack than in a small one.
The authors conclude that the world is becoming more centralized, for example, with large systems for e-mail, auctioning, buying, and trading. While such approaches simplify administration, they are susceptible to the types of attack that the authors want to avoid.
For more information, see http://research.microsoft.com/.
Jay Lepreau, Chris Alfeld, David Andersen (MIT), Kristin Wright (University of Utah)
Lepreau motivated the project by giving two slides of quotes from papers, reviews, and job talks. His implied message was that current methods of evaluating experimental research in distributed systems and networks are either poor or very hard to use. To remedy this, Lepreau's "Flux" research group at Utah is constructing a "configurable Internet in a room" that will be remotely available to all researchers. The system is large: on the order of 220 nodes and 1000 links, connected by a very large switch. The topology is configurable, the link bandwidth/latency/errors/drops are configurable, and every bit of software on the nodes is replaceable by the users.
Such a system raises many research challenges. These include providing a reasonable user interface; calibration, evaluation, and scaling; artifact detection and control; and the NP-hard problem of mapping the virtual network to the available physical resources. Solving the mapping problem is cruical in avoiding bottlenecks in the physical network that inadvertently affect the virtual network.
The system should be limping "soon," and Utah is looking for early users and sponsors. For more information, see http://www.cs.utah.edu/flux/.
Greg Ganger, Steve Schlosser, John Griffin, and David Nagle (Carnegie Mellon University)
MEMS-based storage is an exciting new technology that could provide significant performance gains over current disk drive technology and at costs much lower than EEPROM technology. Based on MEMS (MicroElectroMechanical Systems), this non-volatile storage technology merges magnetic recording material and thousands of recording heads to provide storage capacity of 1-10 GB of data in under 1 cm2 area with access times of 1-3 ms and streaming bandwidths of over 50 Mbytes per second. Because MEMS-based storage is built using photolithographic IC processes similar to standard CMOS, MEMS-based storage has per-byte costs significantly lower than DRAM and access times an order of magnitude faster than conventional disks. MEMS-based storage can incorporate both storage and processing into the same chip, i.e., a single computing "brick" that contains processing and both nonvolatile and volatile storage. Developing these concepts is the central focus of a new research center at CMU called CHIPS.
Although MEMS-based storage devices may still be several years away from commercialization, their potential impact in reducing the memory gap makes them an important technology for systems designers' consideration. Therefore, the authors have begun the exploration process, seeking an understanding of how MEMS-based storage can improve application performance and how different MEMS device characteristics can fundamentally change the behavior and design of storage systems. Early results indicate that MEMS-based storage can reduce application I/O stall times by over 80-99% for a set of five file system and database workloads. The resulting speedups for these applications range from 10% to 20X, depending mainly on the ratio of computation to I/O. Ongoing work refines the device models, explores how they should change system architectures and memory hierarchies, and investigates newly enabled applications.
For more information, see http://lcs.web.cmu.edu/research/MEMS/.
Umakishore Ramachandran (Georgia Tech)
A hardware continuum of Computation, Communication and Interaction (CCI) devices exists. Currently, the CCI devices are explicit with use, for example, with laptops or cell phones. Ramachandran expects that future CCI devices will be embedded in many other places, such as lamp posts and ceiling panels. The hardware continuum will then range from single chip CCI (say, in a wristwatch) to distributed CCI devices (say, in a collection of ceiling panels). The challenge is to seamlessly integrate a distributed hardware spectrum. The goal of this work is to develop and build a software infrastructure for the ubiquitous capture of and access to data that is generated and stored by a collection of devices.
Some of the challenges presented in this work include the data intensive computing of emerging applications. Ramachandran expects applications of the future to primarily consist of stream-oriented processing such as interactive video and audio. Another challenge consists of the computational requirements to analyze such data streams and extract content, thus implying various uses of parallelism. Last but not least, the distributed environment opens the need to dynamically reconfigure and reallocate resources.
The Stampede project http://www.cc.gatech.edu/~rama/stampede/stampede.html, started at Compaq CRL, addresses these challenges. The current work on ubiquitous presence builds upon Stampede. The URL for the author's webpage is http://www.cc.gatech.edu/~rama/homepage.html.
John Regehr and John A. Stankovic (University of Virginia)
Regehr observed that users execute many types of applications, such as audio, video, process control and defense systems on general operating systems. He also observed that CPU schedulers on general operating systems cannot appropriately schedule all types of applications.
To address the problem, Regehr and Stankovic want applications to have the capability to obtain appropriate schedulers and incorporate them into the operating system when needed. Their approach involves hierarchical schedulers, where a top-level scheduler arbitrates among lower-level schedulers. This approach presents another problem of how new schedulers are added and work among lower-level schedulers. To address the second problem, the authors note that schedulers require and provide performance guarantees. As examples, one scheduler may provide a certain percentage of the total CPU to a process or thread. Another scheduler may guarantee a proportional amount of available CPU to a process or thread. A scheduler obtains its guarantees from higher-level schedulers and provides guarantees to the processes or threads it schedules. The authors note that some scheduler hierarchies (e.g., a real-time scheduler under a time-sharing scheduler) do not work.
Regehr and Stankovic use a resource manager to bridge resource requests with the scheduler hierarchy. The resource manager enforces policies attached to schedulers, users, and applications. As an example, a user may assign priorities to guarantee that activities such as answering a phone line are given a higher priority than a game related activity.
Implementation of the work on Windows 2000 has already begun. For more information, see http://www.cs.virginia.edu/~jdr8d/.
Matthew Flatt, Alastair Reid, and Jay Lepreau (University of Utah)
Component-based systems can be difficult to configure properly, and often suffer from poor performance. Flatt presented "C plus Units" (CpU) from the Flux Research Group at Utah, which aims to make the linking of C-based components easier for programmers, and to eliminate much of the overhead of modularity.
CpU components are based on a theoretical model of units [PLDI98], which have well-defined import and export interfaces. Such interfaces help a programmer to understand the requirements and provisions of a component. The interfaces also give CpU more information for matching components during linking and for reporting mismatches. Compound units group related units into a single linking entity, which enables hierarchical composition.
Preliminary results from applying CpU to the OSKit [SOSP97] are promising. Work continues in exploring the optimizations made possible by CpU linking specifications. For more information, see http://www.cs.utah.edu/flux/oskit/.
Michael L. Scott, Sandhya Dwarkadas, Srinivasan Parthasarathy, Rajeev Balasubramonian, DeQing Chen, Grigorios Magklis, Athanasios Papathanasiou, Eduardo Pinheiro, Umit Rencuzogullari, Chunqiang Tang (University of Rochester)
Scott presented Interweave, a distributed system that allows a compute-intensive parallel application executing on a multiprocessor or cluster to interact with other -- possibly distant -- "satellite" machines. Such applications include data mining, scientific visualizations, and distributed intelligent environments.
Interweave's shared memory API allows easier application implementation than application-specific message or RPC protocols. Specifically, Interweave allows a programmer to map shared segments into program components. Interweave uses a single-writer/multiple-reader protocol that associates each segment with a version number. A program-specific predicate allows the system to determine whether a cached segment copy is "recent enough" to be used by a component. Consistency is maintained with a novel hashing mechanism that stores the history of a segment in a bounded space to aid invalidation of inconsistent copies. Twin and diff techniques, like those used in many software DSMs, are used to track changes and update outdated copies.
Interweave employs a type system based on CORBA IDL for interactions between heterogeneous machines. When a segment is shared between components, Interweave automatically converts the segment to a standard wire format, remaps pointer data, and even allows programs to reorganize dynamically allocated data within a segment for spatial locality.
Interweave merges and builds upon the Cashmere and InterAct projects. It combines hardware coherence within multiprocessors, Cashmere-style lazy release consistency within tightly coupled clusters, and version-based consistency for distributed shared segments. A preliminary implementation is currently running on an AlphaServer cluster. For more information, see http://www.cs.rochester.edu/u/scott/interweave/.
Chenyang Lu, John A. Stankovic, Tarek Abdelzaher, Sang H. Son, and Gang Tao (University of Virginia)
Chenyang Lu presented this work which focuses on soft real-time systems in unpredictable environments, such as Web servers that would like to provide response time guarantees for applications such as business transactions. Lu noted that their approach was grounded in control theory and described an algorithm that uses feedback to keep deadline miss ratios acceptably low.
For more information, see http://www.cs.virginia.edu/~cl7v/fcs.html.
Eyal de Lara, Dan Wallach, and Willy Zwaenepoel (Rice University)
Dan Wallach described a vision of the future in which users will be running desktop productivity applications to edit large files but will have low-bandwidth connectivity. Users will want to save edits to a safe, stable "home" file system over these slow links. Pupeteer provides a proxy-based system that uses existing hooks to adapt applications to these needs at the component level. Wallach described an analysis of PowerPoint files showing larger files are largely composed of images, which could be compressed for a significant space and time savings.
For more information, see http://www.cs.rice.edu/~delara/talks/.
Haifeng Yu (Duke University)
Haifeng Yu began his work-in-progress presentation by arguing that replication is important for the Web, but replication of dynamic content is difficult because it is hard to achieve both consistency and availability. He described TACT as a toolkit that defines a consistency metric and provides a knob to allow applications to choose arbitrary points in the availability/consistency spectrum.
For more information, see http://www.cs.duke.edu/ari/issg/TACT/.
Yasushi Shinjo (University of Tsukuba) and Calton Pu (Georgia Institute of Technology)
Yasushi Shinjo noted that, although thread-specific data (TSD) is appealing because it requires no synchronization, is easy to use, and is fast, TSD is expensive in the POSIX model because it requires function calls. He described an approach where programmers focus on developing a correct program and the system employs partial evaluation and runtime specialization in order to convert TSD function calls into a few instructions.
For more information, see http://www.hlla.is.tsukuba.ac.jp/~yas/papers/.
Tatyana Ryutov and Clifford Neuman (University of Southern California Information Sciences Institute)
Tatyana Ryutov said that the motivation for this work-in-progress was to be able to integrate security services at the application layer, providing conditional and extensible policies, EACLS, allowing users to integrate many security mechanisms. She described their Generic Authorization and Access-control API (GAA API) that allows applications to use this security infrastructure to implement security policies. For more information, see http://www.isi.edu/gost/info/gaa_api.html.
Pål Halvorsen (University of Oslo)
Pål Halvorsen described the design of a multimedia storage server that aimed to reduce communication protocol overhead using techniques such as storing protocol headers on disk along with the data.
David Cheriton (Stanford University)
In a hilarious but fairly compelling introduction to a mile-a-minute talk, David Cheriton explained why "IPV6 is dead." He then presented his nomination for a replacement, "TRIAD": a new Internet addressing and routing architecture. The design philosophy of TRIAD is to build on systems that work well and are widely deployed, namely IPv4 and DNS. In TRIAD, all identification is based on DNS names. Names are temporarily translated into IPv4 addresses for local use in forwarding, but DNS names are the only visible, end-to-end, persistent identifiers. Cheriton described this scheme as having several advantages: there is no state in relays (allowing arbitrary scaling), end-to-end semantics are preserved, and backbone core routers do not need to be changed.
Aaron Brown (University of California, Berkeley)
Aaron Brown started out by noting that benchmarks shape a field. Whereas most benchmarks today measure performance, many of today's and tomorrow's real-world problems are not strictly problems of performance, but rather problems of maintainability, availability, and ease of growing and evolving a system. He described a methodology for benchmarking availability that leverages existing performance benchmarks, combining them with fault injection, reporting results both graphically and numerically. He described future plans to add maintainability and growth/evolution benchmarks, noting that these would be more difficult because they involve more human factors issues.
For more information, see http://iram.cs.berkeley.edu/istore/.