?:abstract
|
-
We describe the design and implementation of Protean - the Microsoft Azure service responsible for allocating Virtual Machines (VMs) to millions of servers around the globe A single instance of Protean serves an entire availability zone (10-100k machines), facilitating seamless failover and scale-out to customers The design has proven robust, enabling a substantial expansion of VM offerings and features with minimal changes to the core infrastructure In particular, Protean preserves a clear separation between policy and mechanisms From a policy perspective, a flexible rule-based Allocation Agent (AA) allows Protean to efficiently address multiple constraints and performance criteria, and adapt to different conditions On the system side, a multi-layer caching mechanism expedites the allocation process, achieving turnaround times of few milliseconds A slight compromise on allocation quality enables multiple AAs to run concurrently on the same inventory, resulting in increased throughput with negligible conflict rate Our results from both simulations and production demonstrate that Protean achieves high throughput and utilization (85-90% on a key utilization metric), while satisfying user-specific requirements We also demonstrate how Protean is adapted to handle capacity crunch conditions, by zooming in on spikes caused by COVID-19 © 2020 Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020 All rights reserved
|