A new arXiv paper from April 2026 finds Prithvi-EO and ViT-Base embeddings yield universally negative R² under cross-country maize yield prediction. The study evaluates 6,404 field observations across five African nations using a leave-one-country-out scheme.
Key facts
- 6,404 maize field observations from five African countries.
- Prithvi-EO / Ridge achieves least-negative LOCO R² of −0.027.
- All nine feature–regressor combinations yield negative cross-country R².
- Within-country random CV yields moderate R²; cross-country collapses.
- Paper argues yield distribution shift, not representation, is the limit.
Geospatial foundation models are marketed as universal feature extractors for Earth observation tasks, but a rigorous generalization test out of sub-Saharan Africa shows they fail to transfer across national boundaries. The paper, Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa [arXiv], tests Prithvi-EO-1.0-100M (a NASA-developed Vision Transformer pretrained on satellite imagery) and ViT-Base against traditional Sentinel-2 spectral indices.
The core finding: every feature-regressor combination achieves negative R² under leave-one-country-out (LOCO) cross-validation. Within-country random splits yield moderate R², but the moment the model must predict on an unseen country, performance collapses. The best result comes from Prithvi-EO with Ridge regression, scoring −0.027 R². That means the models are worse than simply predicting the mean yield of the target country.
Why Foundation Models Don't Help
The paper's unique take: the bottleneck is not representation quality but a shift in yield distribution between countries. Even frozen Prithvi-EO embeddings, which encode rich spatial-spectral features, cannot compensate for the fact that maize yields in Kenya follow a different distribution than those in Tanzania. The authors argue that most published benchmarks overstate generalization by reporting only within-country performance.
This echoes a broader pattern in applied ML: foundation models excel when the test distribution closely matches the training distribution, but their value diminishes under severe covariate shift. The paper releases a reproducible negative benchmark — a rare and valuable contribution for a field that tends to publish only positive results.
Implications for Food Security AI
Accurate cross-country yield forecasting is critical for food security planning in sub-Saharan Africa, where smallholder maize farming dominates. The negative result suggests that purely satellite-based models, even with foundation model embeddings, cannot replace ground-truth yield surveys or country-specific calibration. Future work must either collect more representative training data or develop methods to handle distribution shift explicitly.
The study joins a growing body of work showing that foundation models for Earth observation are not silver bullets. A prior paper from April 2026 [arXiv] evaluating nine pretrained audio models for music recommendation similarly found that pretraining does not guarantee cross-domain transfer.
What to Watch
Watch for follow-up work that attempts to close the generalization gap — either through domain adaptation techniques, multi-task learning across countries, or integration of non-satellite data sources like soil surveys and market prices. The authors' released benchmark provides a standardized evaluation protocol for future methods to beat.
What to watch

Watch for follow-up papers using domain adaptation or multi-task learning to close the LOCO generalization gap on the released benchmark. Also monitor whether NASA or IBM adjust Prithvi-EO training to include more diverse geographic yield data.










