Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting
The article discusses the challenges posed by data movement overhead in large-scale Mixture of Experts (MoE) Large Language Models (LLMs) and presents insights from profiling four state-of-the-art models. It highlights how understanding these patterns can lead to architectural improvements in wafer-scale GPUs for enhanced performance.