Technique · inference
GPTQ Quantization
Post-training quantization to 3-4 bits using second-order information, enabling 175B-scale LLMs to run on single-GPU inference.
0
Products deploying
—
Avg research → prod
—
First commercial deploy
Deployment timeline
No verified deployments yet in our tracked product set.