MiniMax M3 Sparse Attention: 15.6x Decoding Speedup at 1M…
MiniMax M3 Sparse Attention: 15.6x Decoding Speedup at 1M Tokens
MiniMax M3 sparse attention achieves 9.7x prefilling and 15.6x decoding speedup at 1M tokens, reversing M2's full-attention stance.


