Subgraph Atlas · centered on entity
GPTQ Quantization
technique0 mentions· velocity: stablePost-training quantization to 3-4 bits using second-order information, enabling 175B-scale LLMs to run on single-GPU inference.
Two-hop subgraph: this entity, every entity it directly relates to, and every entity those neighbors relate to. Drag a node, scroll to zoom, click to inspect — or click any neighbor and re-center the atlas there.
0 nodes · 0 edges · loading…
companypersonai_modelproductresearch_labbenchmarkframework
drag to move · scroll to zoom · click a node
Top connections
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformerspaper
0 mentions
→ Center atlas here
ISTAcompany
0 mentions
→ Center atlas here
INT8 Weight Quantization for LLMstechnique
0 mentions
→ Center atlas here
AWQ (Activation-Aware Weight Quantization)technique
0 mentions
→ Center atlas here