On November 28, Moonshot AI and Tsinghua University’s MADSys Lab jointly released the design scheme for the Mooncake inference system at the core of Kimi in June 2024. The system, based on PD separation centered around KVCache and storage-computing architecture, has improved the throughput of inference.
To further accelerate the application and promotion of this technology framework, Moonshot AI and Tsinghua University’s MADSys Lab joined forces with 9#AISoft, Alibaba Cloud, Huawei Storage, Wall-facing Intelligence, and Qujing Technology to launch the open-source project Mooncake. They aim to jointly build a large model inference architecture centered around KVCache.
The Mooncake open-source project is centered around a large-scale KVCache cache pool. It significantly reduces computational overhead through innovative storage-computing concepts, thereby improving inference throughput. The open-source process will be phased, gradually revealing the implementation of the high-performance KVCache multilevel cache, Mooncake Store, while ensuring compatibility with various inference engines and underlying storage/transport resources. The Transfer Engine part of the transport engine is now open-source worldwide on GitHub.
The ultimate goal of the Mooncake open-source project is to create a new high-performance memory semantic storage standard interface for the era of large models and to provide a reference implementation scheme.
SEE ALSO: Bosch China Responds to Global Layoffs: Not Affecting China Region