Perspect — Exploiting essential characteristics of performance issues for automatic performance diagnosis 

Editor’s Note: Perspect is an innovative performance debugging tool developed by Jenny Ren and the team. It introduces a novel concept known as Relational Debugging. The key idea of relational debugging is to analyze the “relation” between runtime events and use relations to explain performance issues. In her OSDI’23 paper, Jenny makes the analogy between performance problems and relative motion in physics.

"Just like the speed of an object is a relative measure depending on the reference frame, so is performance when viewed from different runtime events during program execution. Root causes of performance problems can be revealed by identifying the appropriate runtime event as the reference frame, and analyzing changes of measures relative to these events (i.e. relations) between a good run and a bad run (with performance anomalies)."

We invite Jenny to write about her story of developing Perspect and to explain the simple and elegant idea of relational debugging.

Motivation:  A lack of tooling for diagnosing performance issues 

While studying the performance evolution of Linux core functionalities, I manually diagnosed many code changes that caused performance issues. The typical workflow is as follows: (1) profiling the execution and finding the code regions that slowed things down, and (2) understanding what caused the slowdown. For changes that are easy to diagnose, only Step (1) is mostly required. For harder cases, I spent lots of manual effort on Step (2) in order to link the slowdown to the root cause in code. I found no readily available tools that could help me automate the diagnosis process.

Traditional diagnosis techniques are ineffective for performance issues

Before I decided to develop a new tool to diagnose performance issues, my first thought was to reuse or extend techniques for diagnosing functional failures, for example, statistical debugging or Kairux (SOSP’19). Can we apply these techniques to performance diagnosis? Unfortunately, the answer is not positive — they work poorly for difficult performance issues.

The Story of Go-909

Let’s look at Go-909 as an example. Go-909 caused severe memory leaks and impacted many workloads. In the issue ticket of Go-909, a developer reported that they found the same workload executed normally with the 64-bit go runtime, but runs out of memory with the 32-bit go runtime. Strange! Go-909 was considered one of the hardest performance issues in terms of diagnosis; its root cause was not confirmed until after a year when it was diagnosed through a trial-and-error process.

The root cause is that the version of the garbage collector (GC) in Go-909 is “imprecise” – it does not differentiate between pointers and constants. Consequently, a constant value can be mistaken as a pointer, and if that value happens to be a valid address pointing to an object, the object would never be reclaimed. In the 32-bit go runtime, due to the memory layout, many constants happen to “point” to valid objects, causing severe memory leaks. The 64-bit Go runtime is in fact not bug free, but the memory leak happens less frequently.

So, why was Go-909 hard to diagnose? The root cause got triggered in both executions, but only happened significantly more often in the 32-bit Go runtime—there are no runtime events that occurred in the bad run that didn’t occur in the good run, and vice versa. This fact impairs existing diagnosis techniques which assume the root cause exists only in the bad run but not in the good run, and thus represent the root cause using invariants or absolute predicates.

We observe this pattern – the root cause can only be expressed in relative terms — in many other hard performance issues. So, the design of Perspect uses a relative representation of root causes, called a relation. A relation, denoted as `R(A | B)`, is the distribution where each element represents given each event B that occurred in the execution, the number of causally related events As that occurred.

How does “relation” help performance diagnosis? Back to Go-909, Perspect captures its root cause using a simple relation after it observes the 64-bit and 32-bit executions. The GC algorithm in Go-909 uses a mark and sweep algorithm: the mark phase scans from stacks and data segments for pointers, and marks objects pointed to by these pointers. The sweep phase reclaims objects that have not been marked. The root cause of Go-909 was because the GC algorithm was marking objects unreachable by any pointers—because their addresses were stored in constants. Programmatically, these constants can be differentiated from real pointers because real pointers are returned by malloc but constants are not. When Perspect analyzes the execution, it finds constants do not have data flow dependencies on malloc’s return value, whereas real pointers do. As a result, Perspect returns the following root cause:

R(malloc_return | mark_object) = 0.99% in 64bit run
R(malloc_return | mark_object) = 0.01%  in 32bit run

Which states that in the 64bit run, 99% of the marked objects are reachable from real pointers, and in the 32bit run, only 1% of the marked objects are reachable from real pointers! So most of the marked objects in the 32bit run are unreachable objects that should have been reclaimed.

No wonder there is a severe memory leak! 

Scaling Perspect to complex real-world programs

The goal of Perspect algorithm is to determine the appropriate reference events, and causally relevant symptom events, where the change in relation between these two events in the good and bad run captures the root cause of the performance issue. A straightforward algorithm for Perspect would simply enumerate all possible relations between events of any two instructions. But this basic algorithm has a combinatorial complexity! Perspect smartly uses a search algorithm to address this combinatorial problem. It only traverses through the program once — starting from the program entry points, ending in symptoms of the performance problem. At each step, Perspect builds a relation between the current instruction and the symptom instruction. If a relation has not changed, it is discarded, otherwise, Perspect checks if the change in relation R1 is caused by another relation R2, and if so, refines R1 to R2. Let’s look at an example. Below is a simplified code snippet of a GC algorithm.

gc() {
    ...
    mark();
    sweep(); 
}

mark() {
    …
    if (!obj.marked)
        obj.marked == true;
}

In the case of Go-909, more objects got marked in the 32-bit run causing the memory leak. In the 32-bit run, R(obj.marked == true | gc()) increased, which means given each GC cycle, more objects got marked. Perspect refines this relation to R(obj.marked == true | mark()) because R(mark()|gc()) == 1 — each GC cycle always calls mark() once. In other words, it means that the reason more objects got marked has nothing to do with the logic executed between gc() and mark(); the root cause lies within mark(). While this might seem trivial in this example, Perspect uses this rule to eliminate lots of irrelevant root-cause candidates in complex programs.

On top of the above efficient algorithm, Perspect’s implementation is also heavily optimized, which enables Perspect to identify the root cause within 10 minutes for most real-world performance issues we evaluated. One optimization is that Perspect’s static analysis phase tries to minimize the static dependency graph of performance symptoms: it skips causal predecessors that cannot influence the performance symptoms, but are expensive to analyze, such as loops. Perspect also analyzes pointers dynamically, because static pointer analysis is expensive and imprecise. Perspect is also parallelizable across multiple servers.

Evaluating Perspect on real-world performance bugs

As a highlight, we used Perspect to diagnose two open bugs in MongoDB where the root causes were unknown. Perspect successfully pinpoints the root causes which have been confirmed by the MongoDB developers.

[Perspect’s result] ties all the pieces together into a nice explanation.” -- MongoDB developers

In total, we applied Perspect on 12 real-world performance bugs, and Perspect was able to diagnose the root causes of 10 of them. For one of the bugs that Perspect was not able to diagnose, the root cause was in the Linux kernel; in this case, Perspect successfully excluded the root cause from the application code. For the other bug, Perspect was ineffective because the source code changed too much between the good and bad run. Note that Perspect is able to tolerate moderate code changes and match instructions between different source code versions. For example, Perspect successfully pinpointed the root cause of Mongodb-44991, where the performance issue is a regression that occurred in a newer version. 

As future work, we plan to apply Perspect’s relational debugging algorithm to more diverse use cases. Currently, Perspect is designed as an offline tool that works in a single machine setting. We are interested in applying relational debugging to diagnose performance issues in distributed systems. Further, the overhead of relational debugging can be reduced by running the algorithm incrementally, so it can be deployed to diagnose performance root causes online.

The source code of Perspect can be found at: https://gitlab.dsrg.utoronto.ca/dsrg/perspect

About the authorXiang (Jenny) Ren is a PhD student in the Department of Electrical and Computer Engineering (ECE) at the University of Toronto, advised by Ding Yuan. She is interested in improving the performance and reliability of software systems, including operating systems, database systems, and distributed systems. She is on the academic job market.

Disclaimer: Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGOPS.

Editor: Tianyin Xu (University of Illinois at Urbana-Champaign) and Dong Du (Shanghai Jiao Tong University)