Summarized by Rajeev Balasubramonian (Univ. of Rochester) and Sai Susarla (Univ. of
Utah) |
Fay Chang and Garth Gibson
(Carnegie Mellon University) |
The work tries to target the I/O latency bottleneck by speculatively issuing a prefetch request. Usually, processors remain idle while waiting for an I/O request to return. The idea here is to use this idle time to speculatively continue to execute code to detect future I/O accesses. This detection helps prefetch data so that the effective latency is reduced. The speculative execution is effected by using a binary modification tool.
An earlier naive approach (published in OSDI '99), that always attempted speculation, yielded significant performance benefits in all but one of the benchmarks used. But this approach fell well short of a manual smart modification of the code. Ongoing work has looked at optimizations to make this speculation more self-aware, i.e., use history information to do speculation only when benefits are likely.
For more information, see http://www.pdl.cs.cmu.edu/TIP/spechint.html.
Emmanuel
Cecchet (INPG/INRIA -- SIRAC Laboratory) |
The poster described initial observations in designing a large-scale high performance cluster computing server. The cluster is comprised of a number of PCs interconnected by a Scalable Coherent Interface (SCI) network and uses a distributed shared memory (DSM) system called SciFS. This DSM is built atop the SciOS operating system and exploits the low-latency memory-mapped features of SCI to provide high performance. The DSM system does various optimizations like remote memory swaps as opposed to disk swaps.
A SCI network is organized as a ring. Initial experiments showed that the latency numbers are quite poor for rings of size more than 8. Hence, the network was configured as clusters of small rings with a switched interconnection. This has much better scalability and allows for DSM optimizations that attempt locality within a cluster.
For more information, see http://sci-serv.inrialpes.fr/ and http://sirac.inrialpes.fr/.
Han Kiliccote (Carnegie Mellon
University) |
PASIS is an innovative framework for demonstrating perpetually available information systems that guarantee the survivability of information under malicious attacks or system component failures. PASIS is based on a novel architecture which breaks all information into "chunks" and distributes these "information chunks" in novel ways by using information replication and dispersal methods. This enables PASIS to not have any single point of failure (i.e., it is not possible to destroy the information in PASIS or to degrade the performance, by eliminating or capturing few selected components or information chunks within the system) and thereby achieve a very high degree of security and resiliency against failures and attacks.
Pål Halvorsen,
Thomas Plagemann, and Vera Goebel (University of Oslo) |
The project targets multimedia-on-demand servers. Retrieval of data from multimedia storage servers is a major bottleneck and the INSTANCE project attempts to identify and eliminate these bottlenecks. The poster described three such optimizations -- network level framing, integrated error management and zero-copy-one-copy memory architecture -- to alleviate the problem.
In network level framing, the server stores the multimedia packets in frame format and directly sends these when the clients demand it. This removes the overhead of encoding and decoding the data into frames and lends the abstraction of a network router to the server. Error encoding is also removed by saving the parity information. This is passed along to the client and the error-checking is now done on the client side.
For more information, see http://confman.unik.no/~paalh/instance/.
Khalil Amiri, David Petrou, Greg Ganger, and
Garth Gibson (Carnegie Mellon University) |
Traditional distributed systems statically partition their functions across clients and servers. Dynamic changes in the load render this partition suboptimal. The poster described a prototype targeting data-intensive applications, that dynamically migrated functions across the cluster.
The system, ABACUS, is comprised of a programming model and a run-time system. The programming model makes the programmer split the application into independent objects. Methods are provided to checkpoint and restore objects during migration. The runtime system takes care of resource monitoring and the migration of objects. Preliminary results have shown a great improvement in response times. For more information, see http://www.cs.cmu.edu/~amiri/abacus.html.
Surendar Chandra, Carla Schlatter
Ellis, and Amin Vahdat (Duke University) |
The main aim here is to manage bandwidth for web servers. It does this by using quality-aware image transcoding for multimedia objects, i.e., it provides different QoS for different requests. Transcoding is a transformation used to convert a multimedia object from one form to another.
For transcoding to be useful, the underlying tradeoffs have to be well understood--information quality loss, computational overhead and space savings. Thus, different variations of the same multimedia object are provided to clients based on their dynamic access patterns in an effort to manage available bandwidth. For more information, see http://www.cs.duke.edu/~surendar/research/.
Joon Suan Ong, Yvonne Coady, and
Michael J. Feeley (University of British Columbia) |
The project studies the interaction between virtual memory and high performance network communication. The goal is to provide zero-copy user-level messaging without page pinning at either sender or receiver.
Current systems use explicit page pinning and address translation for the network interface to access user-pages. The project attempts to allow for user-level transfers between unpinned virtual memories by maintaining shadow page tables on the NI. The NI synchronizes with the host during DMA transfers. Since the NI has translation information, it reduces memory utilization and expensive system calls. For more information, see http://www.cs.ubc.ca/spider/feeley/DSG%20Web/dsg_p_netvm.html.
Anurag Acharya, Maximilian Ibel, Matthias
Koelsch, and Michael Schmitt (University of California, Santa Barbara) |
Given that the Internet is transforming to an infrastructure that provides a range of services, there will soon be a need for lightweight network services that can be rapidly developed, easily extended, incrementally deployed and automatically operated. The poster described RENS, which is a prototype of such a service.
RENS services consist of distributed components connected by named communication channels. The communication structure is specified explicitly and the component programmer only has to worry about local state. The RENS runtime environment provides a range of services that are common to all clients--automatic instantiation, fault detection and recovery, scaling, etc. Thus, service designers need only focus on service-specific code.
For more information, see http://rens.cs.ucsb.edu/.
Yasushi Negishi (IBM Research, Tokyo
Laboratory) |
The main requirements of communication systems for PDAs include: supporting disconnected operations to minimize the use of costly links, working with limited resources, using low-quality links. Tuplink is a communication system that supports one-to-one communication and is based on Linda. There are two pools at the ends of the communication path and Linda operations are used for handling them. Synchronization operations are used to make the two pools consistent.
Donald Miller and Alan Skousen (Arizona
State University) |
The key idea here is to provide a flat networkwide peer level architecture for user and system programs. The operating system provides a large distributed single address space, allowing access to the same addresses to multiple users across the network, while simultaneously enforcing protection.
The prototype uses Alpha 21164 based NT systems. The provision of object-grained and inter-thread protection requires hardware support for which modifications to the processor architecture have been proposed. For more information, see http://www.eas.asu.edu/~sasos/.
Steven D. Gribble, Eric A.
Brewer, David Culler, and Joseph M. Hellerstein (University of California, Berkeley) |
This project explores the use of a library of a few well-known distributed data structures (e.g., hash table, tree, log) as the base on which to build cluster-based internet applications. Apps can thus isolate themselves from the complex issues involved in developing these data structures, while reaping all the availability, fault-tolerance and performance benefits of a cluster-based implementation. Internally, these data structures (hash table) are carefully coded from primitive single-machine data structures called "bricks" (e.g., by partitioning a distributed hash table among nodes in the cluster, replicating them across nodes in the cluster etc.). They use Java as the implementation language for these data structures, and employ an event-driven runtime model instead of a thread-per-task model.
Question: How do you partition the data structures among nodes, as a simple-minded page-based partitioning won't work? Answer: We partition them into multiple sub-hash tables, not pages.
For more information, see http://ninja.cs.berkeley.edu/.
Erez Zadok and Jason Nieh (Columbia University) |
They propose a new language, FiST, to describe stackable file systems. FiST uses operations common to file system interfaces (VFS). From a single description, FiST's compiler produces file system modules for multiple platforms. The generated code handles many kernel details, freeing developers to concentrate on the main issues of their file systems. FiST uses Yacc-like grammar to describe file system extensions. He says, using FiST, it's possible to develop an Elephant File System-like file system very quickly with almost the same robustness and performance as the underlying ext2fs file system, instead of reimplementing it from scratch like the Elephant people did. FiST abstracts out the common aspects of vnodes and other FS-related data structures across a wide-variety of OSes and enables them to be manipulated in a platform-independent manner, while automatically generating code to handle platform-dependencies as much as possible. For example, FiST provides pre-call and post-call processing support for all VFS-calls.
FiST also lets you apply some operations to sets of file system operations such as all of those that change state (e.g., unlink or write), or all of those that do not change state (e.g., readlink or read). In addition, you can also refer to sets of functions that apply only to file names or to file data. This offers a convenient and concise method of affecting change in many functions at once. For more information, see http://www.cs.columbia.edu/~ezk/research/fist-lang/index.html.
Eric Jul, Povl Koch, Jørgen S. Hansen, Michael Svendsen,
Kim Henriksen, Kenn Nielsen, and Mads Dydensborg (University of Copenhagen) |
The goal of this project is to combine the theory of DSM systems with database technology to build a distributed database solution that can be used by search engines and E-commerce systems where access latency and availability are important and where updates to data should be reflected in the system as soon as possible. This project focus is on centralised cluster-based Internet servers with very large data sets -- normally terabytes of data. The system uses the SCI technology as communication backbone, a technology that allows nodes to map memory regions of other nodes into their local address space. The SCI network handles consistency of the memory regions on the nodes.
Jon Howell and David Kotz
(Dartmouth College) |
Restricted delegation enables flexible administrative boundaries. Conventional systems assume a hierarchy of administrative control, and thus cannot express non-hierarchical trust relationships. Restricted delegation, on the other hand, can model both hierarchy as well as arbitrary trust graphs. They have operators for conjunction (e.g., of capabilities of both A and B), super-imposing restriction (e.g., apply restriction R to further restrict capability A). They also have the ability to defer access control decisions to the ultimate resource server. For more information, see http://www.cs.dartmouth.edu/~jonh/research/delegation/.
Ian
McDonald (University of Glasgow) |
The virtual memory management policies provided by traditional OSes is too rigid and doesn't support app-specific quality-of-service requirements well. However, writing complete user-level virtual memory management systems is very hard, even if the OS provides support for app-specific policies.
A flexible, extensible, distributed memory management hierarchy has been built that provides application developers with the ability to specify sub-components that would best meet the application's needs. This new hierarchy utilises compressed caching, pre-fetching from the local disk, and memory on remote hosts. Each component of the hierarchy has a strongly typed interface allowing developers to replace individual components with their own implementation without the need for recompilation.
This is Implemented in the Nemesis operating system, which provides user-level VMM support. Currently working on extending the capabilities of this new hierarchy to provide a high-level QoS interface that allows the app-developer to specify the level of page fault handling performance, leaving the construction of the relative individual sub-components to the VM system. For more information, see http://www.dcs.gla.ac.uk/people/personal/ian/research/.
Rolf Neugebauer (University of Glasgow) |
They developed a Unix-like run-time environment for the Nemesis operating system. Challenges addressed: i) providing Unix-style linkage in a Single Address-Space Operating System (SASOS), and ii) designing the personality so that no unwanted interactions between different processes occur.
Nemesis is a library-based OS with support for QoS for all shared resources in the system. Hence the Unix personality is implemented entirely as a set of shared library-based components (like Exokernel) avoiding any shared state. Unix-style linkage is simulated using small veneer libraries, which reedirect standard Unix function calls to a closure based implementation. This approach allows them to tailor the run-time environments of processes to their needs--an important feature for embedded systems.
For more information, see http://www.dcs.gla.ac.uk/~neugebar/.
Prashant Pradhan and Anindya Neogi (State University of New
York at Stony Brook) |
"Suez" is a high-performance IP router that is built from off-the-shelf Intel hardware and Gbit/sec system-area network technology from Myrinet. Configuration: 8-node Pentium-2/300MHz interconnected by a Myrinet switch, exposing 16 100-Mbps Fast Ethernet ports.
Highlights include: i) Separate processing pipeline for realtime and non-realtime flows; ii) maps network addresses to virtual address; therefore can use CPU cache hierarchy for fast routing table lookup. iii) Suez can route packets on a flow-by-flow basis by reusing routing table lookup effort for long connections.
For more information, see http://www.ecsl.cs.sunysb.edu/suez.html.