XConn and MemVerge Unveil Scalable CXL Memory for AI Inference

Demonstration at SC25 showcases a 100 TiB-scale CXL reminiscence pool overcoming the “reminiscence wall” for next-generation AI fashions and inference-first workloads

As enterprises speed up adoption of huge language fashions (LLMs), generative AI, and real-time inference functions, a brand new bottleneck has emerged: reminiscence scale, bandwidth, and latency. XConn Technologies (XConn), a frontrunner in next-generation interconnect options for high-performance computing and AI infrastructure, and MemVerge®, the chief in Big Memory software program, at this time introduced a joint demonstration of a Compute Express Link® (CXL®) reminiscence pool designed to interrupt by way of the AI reminiscence wall. The reside demo will happen at Supercomputing 2025 (SC25) in St. Louis, November 16–21, 2025 in sales space #817 station 2 and 8.

Academic and trade analysts agree that reminiscence bandwidth progress has lagged far behind compute efficiency. While server FLOPS have surged, DRAM and interconnect bandwidth have scaled rather more slowly, making reminiscence the dominant bottleneck for many AI inference workloads. Experts warn that AI progress is already hitting a reminiscence wall forcing a speedy want for reminiscence and interconnect architectures to evolve. The memory-intensive nature of retrieval-augmented era, vector search, agentic AI, and massive language mannequin inference is pushing conventional DDR and HBM-based server architectures to their limits, creating each efficiency and TCO challenges.

“As AI workloads and mannequin sizes explode, the limiting issue is now not simply GPU rely, it’s how a lot reminiscence could be shared, how briskly it may be accessed, and how cost-efficiently it might probably scale,” stated Gerry Fan, CEO of XConn Technologies. “Our collaboration with MemVerge demonstrates that CXL reminiscence pooling at 100 TiB and past is production-ready, not theoretical. This is the structure that makes large-scale AI inference really possible.”

To handle these challenges, XConn and MemVerge are demonstrating a rack-scale CXL reminiscence pooling resolution constructed round XConn’s Apollo hybrid CXL/PCIe swap and MemVerge’s Gismo expertise, optimized for NVIDIA’s Dynamo structure and NIXL software program stack. The demo showcases how AI inference workloads can offload and share huge KV cache sources dynamically throughout GPUs and CPUs, attaining better than 5× efficiency enhancements in contrast with SSD-based caching or RMDA-based KV cache offloading, whereas decreasing complete value of possession. The demo notably exhibits a scalable reminiscence structure for AI inference workloads the place there’s a disaggregation of prefill and decode work phases.

“Memory has grow to be the brand new frontier of AI infrastructure innovation,” stated Charles Fan, CEO and co-founder of MemVerge. “By utilizing MemVerge GISMO with XConn’s Apollo swap, we’re showcasing software-defined, elastic CXL reminiscence that delivers the efficiency and flexibility wanted to energy the following wave of agentic AI and hyperscale inference. Together, we’re redefining how reminiscence is provisioned and utilized in AI knowledge facilities.”

As AI turns into more and more data-centric and memory-bound, reasonably than compute-bound, conventional server architectures can now not sustain. CXL reminiscence pooling addresses these limitations by enabling dynamic, low-latency reminiscence sharing throughout CPUs, GPUs, and accelerators. It scales as much as lots of of terabytes of shared reminiscence, reduces TCO by way of higher utilization, reduces over-provisioning and enhances throughput for inference-first workloads, generative AI, real-time analytics, and in-memory databases.

SC25 attendees can expertise the joint demo that includes a CXL reminiscence pool dynamically shared throughout CPUs and GPUs, with inferencing benchmarks illustrating important efficiency and effectivity good points for KV cache offload and AI mannequin execution. For extra particulars about SC25 and to register, go to https://sc25.supercomputing.org.

The put up XConn and MemVerge Unveil Scalable CXL Memory for AI Inference first appeared on AI-Tech Park.

Similar Posts