Few-Shot Learning (FSL) addresses the challenge of training models with very limited labeled data. Instead of requiring thousands of examples per class, FSL aims to generalize from just a handful (e.g., 1-shot, 5-shot). The core idea is to learn a similarity metric or a meta-learning procedure that can adapt quickly to new tasks.
How it works:
FSL approaches fall into three main families:
1. Metric-based: The model learns an embedding space where examples of the same class cluster together. At inference, a query is compared to support set examples using a distance metric (e.g., cosine similarity). Prototypical Networks (Snell et al., 2017) compute class prototypes as the mean of support embeddings; Matching Networks (Vinyals et al., 2016) use attention over the support set.
2. Optimization-based (Meta-Learning): Model-Agnostic Meta-Learning (MAML; Finn et al., 2017) learns an initial parameter set that can be rapidly fine-tuned on a new task with a few gradient steps. Reptile (Nichol et al., 2018) simplifies this by averaging updates across tasks.
3. Hallucination / Data Augmentation: Generative models (e.g., VAEs, GANs) create synthetic examples from limited data to augment the support set. For instance, MetaGAN (Zhang et al., 2018) combines GANs with meta-learning.
Why it matters:
FSL is critical in domains where labeled data is scarce, expensive, or impossible to collect in large quantities. It reduces the reliance on massive datasets, enables personalization (e.g., adapting a model to a user with a few photos), and allows models to handle rare classes in long-tail distributions.
When it's used vs alternatives:
- Zero-Shot Learning: No labeled examples for new classes; relies on semantic descriptions (e.g., attributes, word embeddings). FSL requires a few examples.
- Fine-Tuning: Requires more data (typically hundreds to thousands) and updates all parameters; FSL often uses meta-learning or frozen embeddings with a small classifier.
- One-Shot Learning: A special case of FSL with exactly one example per class.
- Traditional supervised learning: Requires large labeled datasets; FSL is used when that is infeasible.
Common pitfalls:
- Overfitting: With very few examples, models can memorize noise rather than generalize. Regularization, data augmentation, and careful task design are essential.
- Base class bias: If the base training classes are very different from novel classes, the learned similarity metric may fail. Cross-domain FSL (e.g., Chen et al., 2019) addresses this.
- Evaluation inconsistency: Many papers report accuracy on standardized benchmarks (e.g., miniImageNet, CIFAR-FS, tieredImageNet) with varying splits, making comparison difficult. The use of transductive vs inductive settings also affects results.
- Computational cost: Meta-learning methods like MAML require second-order gradients, which are memory-intensive. First-order approximations (FOMAML, Reptile) reduce overhead but may sacrifice performance.
Current state of the art (2026):
- Foundation models: Large pre-trained vision-language models (e.g., CLIP, ALIGN) achieve strong few-shot performance by simply using the support set to compute prototypes in the shared embedding space (Radford et al., 2021). CLIP with 16-shot achieves ~80% on ImageNet, rivaling supervised ResNets trained on 1.2M images.
- Transformer-based meta-learners: Models like Meta-Transformer (Xu et al., 2023) unify multiple modalities; few-shot adaptation is done via cross-attention between support and query tokens.
- Parameter-efficient fine-tuning: Adapters, LoRA, and prefix tuning are combined with FSL to adapt large models without full fine-tuning. For example, CLIP-Adapter (Gao et al., 2024) uses a lightweight adapter network trained on few-shot examples.
- Benchmarks: miniImageNet (100 classes, 600 images each) and tieredImageNet (608 classes) remain standard; newer benchmarks like Meta-Dataset (Triantafillou et al., 2020) test cross-domain generalization. State-of-the-art accuracy on 5-way 5-shot miniImageNet exceeds 80% (e.g., RFS-simple, Tian et al., 2020).
- Practical deployments: Few-shot learning is used in medical imaging (e.g., diagnosing rare diseases from a few scans), drug discovery (predicting molecular properties with limited assays), and robotics (learning new grasps from a few demonstrations).