CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring
Out-of-distribution (OOD) detection remains a critical reliability challenge for deployed deep learning models. Current methods exhibit frustrating inconsistency: a scorer that excels on one architecture-dataset combination often fails on another. A new paper proposes CORE (COnfidence + REsidual), a method that addresses this inconsistency by fundamentally rethinking how to extract detection signals from a model's internal representations.
The Structural Limitation of Current Methods
The paper identifies a shared structural flaw in existing OOD detection approaches. Methods fall into two broad categories:
- Logit-based methods (e.g., Maximum Softmax Probability, ODIN): These operate solely on the classifier's final output layer, measuring only the model's confidence in its prediction.
- Feature-based methods (e.g., Mahalanobis distance, Gram matrices): These attempt to measure whether a sample belongs to the training distribution by analyzing activations in the full feature space.
The problem, according to the authors, is that confidence and distribution membership are entangled in the penultimate feature space. Feature-based methods attempt to measure membership but do so in a space where the confidence signal dominates and introduces noise. This entanglement causes architecture-sensitive failure modes—what works for a ResNet might fail for a Vision Transformer.
What CORE Does Differently: Orthogonal Decomposition
The key insight is that the penultimate feature vector (the layer before the final classification layer) naturally decomposes into two orthogonal components:
- Classifier-aligned component: The projection of the feature vector onto the classifier's weight vectors. This encodes the confidence signal—how strongly the features align with class directions.
- Orthogonal residual: The component of the feature vector that is orthogonal to all classifier weight vectors. The classifier explicitly discards this information when making predictions.

The researchers discovered that this residual carries a class-specific directional signature for in-distribution (ID) data. While the classifier ignores it for prediction, it contains valuable information about whether a sample belongs to the training distribution—a membership signal that logit-based methods cannot see.
How CORE Works: Disentangling and Combining Signals
CORE operates in three steps:
1. Orthogonal Decomposition
For a given input sample with penultimate feature vector (h \in \mathbb{R}^d), CORE computes:
[h_{\text{align}} = WW^\top h]
[h_{\text{res}} = h - h_{\text{align}}]
where (W \in \mathbb{R}^{d \times C}) contains the classifier weight vectors for C classes. (h_{\text{align}}) lies in the subspace spanned by the classifier weights, while (h_{\text{res}}) is orthogonal to this subspace.
2. Independent Scoring
CORE computes two separate scores:
- Confidence score: (s_{\text{conf}}(x) = \max_c \text{softmax}_c(f(x))) where (f(x)) are the logits
- Residual score: (s_{\text{res}}(x) = |h_{\text{res}}|_2^2) (the squared L2 norm of the residual)
The residual score measures how much of the feature vector doesn't align with any class direction. For ID data, residuals tend to be smaller and follow class-specific patterns.
3. Normalized Combination
CORE combines the scores via:
[s_{\text{CORE}}(x) = \tilde{s}{\text{conf}}(x) + \tilde{s}{\text{res}}(x)]
where (\tilde{s}) indicates min-max normalization over a reference set of ID samples.
Because the two signals come from orthogonal subspaces, their failure modes are approximately independent. When confidence-based detection fails (e.g., on confident OOD samples), the residual signal often succeeds, and vice versa.
Key Results
The paper evaluates CORE across five architectures (ResNet-18, ResNet-50, DenseNet-101, WideResNet-28-10, ViT-B/16) and five benchmark configurations using CIFAR-10, CIFAR-100, and ImageNet as in-distribution datasets.

CORE achieves state-of-the-art performance in three of the five benchmark settings and obtains the highest grand average AUROC (93.6%). Notably, it maintains this performance consistently across all tested architectures, addressing the inconsistency problem that plagues other methods.
Computational Efficiency
CORE adds negligible computational overhead—just the cost of computing (h_{\text{res}}), which requires a single matrix multiplication and subtraction. The paper reports that CORE runs within 1% of the baseline inference time, making it practical for real-time applications.
Why the Orthogonal Residual Works
The effectiveness of the residual signal stems from how neural networks learn. During training, the classifier weights adapt to capture discriminative features for the training classes. The residual contains information that's irrelevant for classification but still characteristic of the training distribution—background patterns, texture statistics, or other non-discriminative but distribution-specific features.

For OOD samples, the residual tends to be larger and less structured because the features don't decompose cleanly into class-aligned and class-orthogonal components as they do for ID data.
Implementation Considerations
CORE requires access to the penultimate features and classifier weights, which are available in standard neural network architectures. The method doesn't require retraining or modifying the model architecture—it works with pretrained models as-is.
The normalization step uses a reference set of ID samples (e.g., the training set or a held-out validation set) to calibrate the score ranges. This makes the method sensitive to the choice of reference data but follows standard practice in OOD detection.
Limitations and Future Work
The paper notes that CORE, like all OOD detection methods, isn't perfect. It still struggles with certain challenging OOD datasets, particularly those semantically similar to the ID data. The authors suggest exploring more sophisticated ways to combine the two signals beyond simple summation, potentially learning the combination weights adaptively.
Additionally, while tested on image classification, the core idea of orthogonal decomposition should apply to other modalities where classifiers operate on penultimate features, suggesting promising directions for NLP and other domains.





