A new arXiv study of 2.4 million inferences across three LLMs finds activation-aware pruning amplifies bias 83.7% at 70% sparsity. Perplexity barely budges, masking the damage.
Key facts
- 2,368,860 inference records across 3 models, 3 pruning methods.
- Stereotype Reliance Score increased 83.7% at 70% sparsity with Wanda.
- 47-59% of previously unbiased items became biased at 70% sparsity.
- 78.3% of 180 comparisons were significant (p < 0.05).
- Unstructured pruning yields zero storage or latency savings on edge hardware.
A controlled empirical study published May 2 on arXiv [Weight Pruning Amplifies Bias] reveals a troubling paradox for edge AI: the pruning methods that best preserve language modeling perplexity also produce the worst fairness outcomes. The authors, Plawan Kumar Rath and Rahul Maliakkal, evaluated three instruction-tuned models (Gemma-2-9b-it, mistral-7b-instruct-v0-3" class="entity-chip">Mistral-7B-Instruct-v0.3, Phi-3.5-mini-instruct) across three pruning methods (Random, Magnitude, Wanda) at sparsity levels from 10% to 70% on the BBQ bias benchmark, totaling 2,368,860 inference records with 5 random seeds.
The Smart Pruning Paradox
Activation-aware pruning (Wanda) preserves perplexity nearly perfectly—just a 3.5% increase at 50% sparsity for Mistral-7B—yet produces the highest bias amplification. At 70% sparsity, the Stereotype Reliance Score (SRS) increased 83.7%, and 47-59% of previously unbiased items developed new stereotypical behaviors. Random pruning, by contrast, destroys language capability entirely (perplexity exceeding 10^4 and reaching 10^8) but produces only random-chance bias. This means perplexity-based evaluation provides false assurance of behavioral equivalence.
No Hardware Gains, Real Alignment Risk
The study further shows that unstructured pruning provides zero storage savings and zero inference latency reduction on real edge hardware, undermining the primary motivation for its use in IoT deployment. Of 180 dense-vs-pruned comparisons, 141 (78.3%) are significant (p < 0.05) with mean effect size |h| = 0.305. Published quantization studies report up to 21% of responses flipping between biased and unbiased states; the pruning results show transition rates nearly three times higher (47-59%), suggesting pruning poses a categorically greater risk to alignment than quantization.

Implications for Edge Deployment
These findings directly challenge the assumption that compression techniques preserving perplexity are safe for deployment. The paper calls for bias-aware validation before deploying pruned models at the edge—a requirement currently absent from most IoT pipelines. For engineers using Mistral or Gemma models on resource-constrained devices, the takeaway is stark: perplexity is a misleading metric for alignment quality, and pruning may introduce latent biases that perplexity-based evaluation cannot detect.

What to watch
Watch for follow-up studies extending this analysis to structured pruning methods (e.g., 2:4 sparsity) and quantization-aware training, which may offer different trade-offs. Also monitor whether edge AI frameworks like TensorFlow Lite and ONNX Runtime adopt bias-aware validation hooks in their pruning pipelines.










