Real Time
Summarized by John Regehr, Univ. of Virginia


Progress-Based Regulation of Low-Importance Processes
John R. Douceur and William J. Bolosky (Microsoft Research)

John presented this paper describing "MS Manners," which is designed to prevent a low-importance task (like a disk defragmenter or content indexer) from interfering with the performance of other applications. The idea is to establish a baseline for the rate of progress of the low-importance process and then to watch for its performance to degrade. Degradation is assumed to be caused by resource contention; when it occurs, the low-importance process is suspended. It awakens after some timeout--if contention is still present, it suspends for a longer time using exponential backoff. MS Manners assumes that the rate of progress of low-importance processes can be measured, and that the impact of resource contention is symmetric; that is, it slows down both contending processes equally. Advantages of MS Manners are that it is resource independent, it involves no kernel modifications, and it requires no human intervention or other knowledge about processes' use of resources.

David Steere, OGI, mentioned that there has been a lot of feedback work recently (for example, the Quasar Project at OGI http://www.cse.ogi.edu/DISC/projects/quasar/ and Feedback Control Real-Time Scheduling at UVA http://www.cs.virginia.edu/~cl7v/fcs.html). He asked about support for applications with timing requirements, and if John and Bill had thought about methods of degrading the progress of the low-importance task other than suspending it? John answered that they had not considered time-dependent applications or other methods of degrading progress, but that the amount that the low-importance process is allowed to interfere with the progress of other applications is tunable.

The second questioner asked if there were interference or instability effects resulting from the exponential backoff, and if the authors had considered adding a random jitter to the backoff times. John said that instability has not been a problem--since MS Manners does not allow concurrency between low-importance process, there is no way for backoffs to collide or otherwise interfere with each other.

Jason Nieh, Columbia, asked why John and Bill didn't also measure the progress of the high-importance process in order to relax the assumption of symmetric degradation. John responded that their assumption is that they don't know what the high-importance processes are. This makes sense in an open system like Windows.

Ed Bugnion, VMware, said that he found the project interesting, but that he is more interested in retaining interactive performance under NT while running the compiler. John said that MS Manners would be suitable for this application if the compiler were considered to be the low-importance application. Scribe's note: the Windows NT scheduler is not tuned to provide good interactive response when there are compute-bound tasks at the same or higher priority. An easy workaround is to reduce the priority of the compute-bound task.

Geoffrey Kuenning, UCLA observed that OS resource meters often stall, jump forward, and are otherwise unreliable, and asked if this is a problem for MS Manners. John responded that these meters often aggregate a lot of information that makes them misleading, but that this had not been a problem for MS Manners since it uses an application-specific progress metric.


Borrowed-Virtual-Time (BVT) Scheduling: Supporting Latency-Sensitive Threads in a General-Purpose Scheduler
Kenneth J. Duda and David R. Cheriton (Stanford)

Ken presented this paper, which argues that the BVT algorithm can be used to effectively schedule a wide variety of real-time and non-real-time applications. BVT is a proportional-share algorithm similar to start-time fair queuing (http://www.usenix.org/publications/library/proceedings/osdi96/vin.html), with the additional feature that threads have a "warp" value that prevents them from losing their share while they are not running; a high warp value increases the likelihood that a thread will be scheduled soon after it awakens, but does not increase its share. Ken next addressed the question of how to choose the share and warp values for tasks, and then discussed using BVT for real-time tasks. He proposed a two-level scheduling hierarchy: a top-level BVT scheduler with admission control that schedules real-time tasks and a lower-level BVT scheduler that admits any task. This allows the CPU needs of real-time tasks to be isolated from other tasks. Implementing BVT involved adding less than 500 lines of code to the Linux kernel. Ken next compared BVT to reservation-based scheduling; BVT was shown to out-perform reservations when the error in the estimate of the CPU time needed by the reservation exceeded about 15%.

Jochen Liedtke, Universität Karlsruhe, said that he likes this scheduler, and asked the first question: had they proven any real-time properties about BVT? Ken said they had not.

Mike Jones, Microsoft Research, thought that BVT was the most practical of the weighted fair-queuing schedulers, and agreed that estimating the amount of CPU time needed for a reservation is difficult. However, based on his experience with the Rialto system, he thought that the scheduler needs to know about the period of real-time applications. Ken said that tasks with tight requirements on their period should be put into the root scheduler and assigned a warp value high enough that they are likely to run whenever they wake up. Mike responded that this makes warp seem analogous to priority, and that this scheme would be vulnerable to the priority-inflation phenomenon that happens when different people assign priorities to the applications that they write: everyone assumes that their application is the most important. Ken responded that he didn't see this as a problem. Scribe's note: the central question here is who assigns warp values. If warp is computed by an admission controller, or if there is coordination among application writers, then good warp values can be chosen. Mike's point was that in absence of global coordination, it is better if application writers specify the resource needs of their applications directly, rather than specifying their relative importances.

Stefan Savage, UW, had experience with one of the early reservation-based schedulers, and said that he now prefers the BVT style of scheduling. However, he was concerned with the difficulty of manipulating dimensionless parameters like share and warp, as compared to estimate and period in a reservation system which correspond to actual physical quantities. He wondered if there might be a way to specify the parameters more intuitively, but retain the benefits of BVT. Ken said that this would be great, if we can figure out how to do it.

Mohit Aron, Rice, asked if they might consider using short quanta instead of warp, since this would be a different way to reduce scheduling latency? Ken answered that by default BVT has very long quanta (200ms) while still offering low scheduling latency for applications that need it. This is the right thing to do, given the detrimental effects of context switches on cache locality.

Greg Minshall, Siara, asked Ken to compare BVT with start-time fair queuing (SFQ). Ken said that although SFQ has bounded worst-case response-time, the bound is long since it's proportional to the number of tasks in the system. Scribe's note: as noted above, the same criticism might be applied to BVT if application writers are each allowed to choose the warp value for their application--low latency is achievable only if a person or admission control system with knowledge of all tasks assigns warp values.

David Steere, OGI, asked if there might be some relation between weight and warp in BVT to period and amount in a reservation based system? Ken thought there might be, and that this mapping would be implemented in the admission controller.

Erik Cota-Robles, Intel, wondered if BVT would do well when scheduling multiple applications with similar scheduling needs? Ken answered that multiple MPEG players worked fine, as long as there is enough processor time to run them all. Erik commented that some kinds of signal processing are highly predictable (arguing for a reservation-based scheduler).

Jason Nieh, Columbia, liked the comparison of BVT with reservations, and asked if it has been compared to fair-queuing without warp? Ken responded that taking away warp would be bad for response time, and interactive applications would not do well.


EMERALDS: A Small-Memory Real-Time Microkernel
Khawar M. Zuberi, Padmanabhan Pillai, and Kang G. Shin (Univ. of Michigan)

The last talk in the last session of the conference was given by Padmanabhan. The EMERALDS system is based on a re-engineering of OS services for small-memory systems. For example, dynamic overhead can be avoided since the location of resources is statically known, and many object instantiations are static also. To reduce task scheduling overhead, EMERALDS uses a new approach called combined static/dynamic scheduling. This avoids some of the dynamic overhead of earliest-deadline first scheduling without incurring the high schedulability analysis and lower utilization bound of rate-monotonic scheduling. EMERALDS also contains a number of optimizations that provide extra information to blocking calls in order to avoid context switches, to optimize priority inheritance to take constant time in the common case, and to speed up state message and mailbox-based communication.

Jason Nieh, Columbia, asked the first question; he wondered if system overhead was really an issue, given Moore's law. Padmanabhan responded that speed is an issue since things change more slowly for embedded chips, most of which are still eight or sixteen bits.

Werner Vogels, Cornell, said that the state messages in EMERALDS are quite similar to the ones in the MARS system. Padmanabhan said that while MARS is related work, EMERALDS has improved upon its state messages in several ways.

Mohit Aron, Rice, ask for clarification: does EMERALDS implement multiple address spaces? Padmanabhan said that there are versions with and without multiple address spaces. Mohit asked why we don't trust applications to remain in their address spaces if we trust them to provide the correct timing constraints. Padmanabhan answered that protecting against bugs in code is the issue, rather than trusting the author.

Jack Stankovic, Univ. of Virginia, said that for the small number of tasks in a typical embedded system there's not much difference between the overheads of rate-monotonic and earliest-deadline scheduling, and that the combined algorithm will probably take up more memory. Padmanabhan answered that in principle the system generation tool could include only the scheduler that was needed, avoiding the memory cost when more general scheduling is not required.