The journey of real-life industry work behind an OSDI paper — global capacity management for millions of servers

As a new hire at Meta (formerly Facebook) in 2020, I was unbelievably fortunate to be entrusted with the question of how we should manage the company’s global server capacity at the scale of millions of servers. I was entrusted with this work despite having zero background in capacity management, and against all odds, not … Read more

Revisiting Distributed Memory in the CXL Era

Message Passing V.S Distributed Shared Memory As Moore’s Law slows down, horizontal scaling has become the predominant strategy for enhancing system performance. Nonetheless, the inherent complexities of distributed programming present substantial challenges in creating efficient, correct, and resilient systems. Streamlining this process remains a fundamental objective of distributed programming frameworks. In the realm of distributed … Read more

Perspect — Exploiting essential characteristics of performance issues for automatic performance diagnosis 

Editor’s Note: Perspect is an innovative performance debugging tool developed by Jenny Ren and the team. It introduces a novel concept known as Relational Debugging. The key idea of relational debugging is to analyze the “relation” between runtime events and use relations to explain performance issues. In her OSDI’23 paper, Jenny makes the analogy between … Read more