Moonshot AI and Tsinghua Researchers Propose PrfaaS: A Cross-Datacenter KVCache Architecture that Rethinks How LLMs are Served at Scale
For years, the way in which massive language fashions deal with inference has been caught inside a field — actually. The high-bandwidth RDMA networks that make trendy LLM serving work have confined each prefill and decode to the identical datacenter, typically even the identical rack. A staff of researchers at Moonshot AI and Tsinghua University…
