Distributed Systems Project Proposal

Fri, 10 Apr 2026 12:00:00 +0000

A high-performance inference gateway in C++ that routes client requests to a cluster of LLM serving replicas. The gateway provides KV-cache-aware routing via consistent hashing, weighted load balancing, fault tolerance with mid-stream failover and request hedging, circuit breaker for degraded replica detection, streaming token delivery, backpressure management, and zero-downtime rolling updates. Replicas participate in a SWIM gossip protocol for decentralized membership and failure detection.

Distributed-Systems on Li Cao's Blog

Distributed Systems Project Proposal