On November 28, Moonshot AI and Tsinghua University’s MADSys Lab jointly released the design scheme for the Mooncake inference system at the core of Kimi in June 2024. The system, based on PD separation centered around KVCache and storage-computing architecture, has improved the throughput of inference. To further accelerate the application and promotion of this […]