Mixture of Experts. A model architecture where only a fraction of parameters activate per token — letting you build huge models with the inference cost of much smaller ones. Mixtral, Gemini 1.5, GPT-4 all use it.
"Their new MoE runs as fast as a 7B but has 400B total params."
No comments yet — say something.
Add your own interpretation of "MoE".