CORE OOD Detection Method Uses Orthogonal Feature Decomposition to Achieve 95.6% AUROC on CIFAR-100
AI ResearchScore: 75

CORE OOD Detection Method Uses Orthogonal Feature Decomposition to Achieve 95.6% AUROC on CIFAR-100

Researchers propose CORE, a new OOD detection method that disentangles confidence and membership signals in penultimate features. It achieves state-of-the-art performance across five architectures with negligible computational overhead.

6h ago·5 min read·2 views·via arxiv_ai
Share:

CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring

Out-of-distribution (OOD) detection remains a critical reliability challenge for deployed deep learning models. Current methods exhibit frustrating inconsistency: a scorer that excels on one architecture-dataset combination often fails on another. A new paper proposes CORE (COnfidence + REsidual), a method that addresses this inconsistency by fundamentally rethinking how to extract detection signals from a model's internal representations.

The Structural Limitation of Current Methods

The paper identifies a shared structural flaw in existing OOD detection approaches. Methods fall into two broad categories:

  • Logit-based methods (e.g., Maximum Softmax Probability, ODIN): These operate solely on the classifier's final output layer, measuring only the model's confidence in its prediction.
  • Feature-based methods (e.g., Mahalanobis distance, Gram matrices): These attempt to measure whether a sample belongs to the training distribution by analyzing activations in the full feature space.

The problem, according to the authors, is that confidence and distribution membership are entangled in the penultimate feature space. Feature-based methods attempt to measure membership but do so in a space where the confidence signal dominates and introduces noise. This entanglement causes architecture-sensitive failure modes—what works for a ResNet might fail for a Vision Transformer.

What CORE Does Differently: Orthogonal Decomposition

The key insight is that the penultimate feature vector (the layer before the final classification layer) naturally decomposes into two orthogonal components:

  1. Classifier-aligned component: The projection of the feature vector onto the classifier's weight vectors. This encodes the confidence signal—how strongly the features align with class directions.
  2. Orthogonal residual: The component of the feature vector that is orthogonal to all classifier weight vectors. The classifier explicitly discards this information when making predictions.

(a) Near-OOD and far-OOD AUROC across five model×\timesID settings. Shaded bands span the best-to-worst scorer within ea

The researchers discovered that this residual carries a class-specific directional signature for in-distribution (ID) data. While the classifier ignores it for prediction, it contains valuable information about whether a sample belongs to the training distribution—a membership signal that logit-based methods cannot see.

How CORE Works: Disentangling and Combining Signals

CORE operates in three steps:

1. Orthogonal Decomposition

For a given input sample with penultimate feature vector (h \in \mathbb{R}^d), CORE computes:
[h_{\text{align}} = WW^\top h]
[h_{\text{res}} = h - h_{\text{align}}]
where (W \in \mathbb{R}^{d \times C}) contains the classifier weight vectors for C classes. (h_{\text{align}}) lies in the subspace spanned by the classifier weights, while (h_{\text{res}}) is orthogonal to this subspace.

2. Independent Scoring

CORE computes two separate scores:

  • Confidence score: (s_{\text{conf}}(x) = \max_c \text{softmax}_c(f(x))) where (f(x)) are the logits
  • Residual score: (s_{\text{res}}(x) = |h_{\text{res}}|_2^2) (the squared L2 norm of the residual)

The residual score measures how much of the feature vector doesn't align with any class direction. For ID data, residuals tend to be smaller and follow class-specific patterns.

3. Normalized Combination

CORE combines the scores via:
[s_{\text{CORE}}(x) = \tilde{s}{\text{conf}}(x) + \tilde{s}{\text{res}}(x)]
where (\tilde{s}) indicates min-max normalization over a reference set of ID samples.

Because the two signals come from orthogonal subspaces, their failure modes are approximately independent. When confidence-based detection fails (e.g., on confident OOD samples), the residual signal often succeeds, and vice versa.

Key Results

The paper evaluates CORE across five architectures (ResNet-18, ResNet-50, DenseNet-101, WideResNet-28-10, ViT-B/16) and five benchmark configurations using CIFAR-10, CIFAR-100, and ImageNet as in-distribution datasets.

(a) Near-OOD and far-OOD AUROC across five model×\timesID settings. Shaded bands span the best-to-worst scorer within ea

CORE 96.2% 95.6% 89.1% 93.6% MSP (baseline) 90.3% 82.1% 74.5% 82.3% ODIN 93.8% 88.7% 76.2% 86.2% Mahalanobis 95.1% 88.9% 81.3% 88.4% Gram 94.7% 91.2% 85.4% 90.4%

CORE achieves state-of-the-art performance in three of the five benchmark settings and obtains the highest grand average AUROC (93.6%). Notably, it maintains this performance consistently across all tested architectures, addressing the inconsistency problem that plagues other methods.

Computational Efficiency

CORE adds negligible computational overhead—just the cost of computing (h_{\text{res}}), which requires a single matrix multiplication and subtraction. The paper reports that CORE runs within 1% of the baseline inference time, making it practical for real-time applications.

Why the Orthogonal Residual Works

The effectiveness of the residual signal stems from how neural networks learn. During training, the classifier weights adapt to capture discriminative features for the training classes. The residual contains information that's irrelevant for classification but still characteristic of the training distribution—background patterns, texture statistics, or other non-discriminative but distribution-specific features.

Figure 2: Each column represents one scorer category: (a) logit-based energy score (far-OOD separates, near-OOD overlaps

For OOD samples, the residual tends to be larger and less structured because the features don't decompose cleanly into class-aligned and class-orthogonal components as they do for ID data.

Implementation Considerations

CORE requires access to the penultimate features and classifier weights, which are available in standard neural network architectures. The method doesn't require retraining or modifying the model architecture—it works with pretrained models as-is.

The normalization step uses a reference set of ID samples (e.g., the training set or a held-out validation set) to calibrate the score ranges. This makes the method sensitive to the choice of reference data but follows standard practice in OOD detection.

Limitations and Future Work

The paper notes that CORE, like all OOD detection methods, isn't perfect. It still struggles with certain challenging OOD datasets, particularly those semantically similar to the ID data. The authors suggest exploring more sophisticated ways to combine the two signals beyond simple summation, potentially learning the combination weights adaptively.

Additionally, while tested on image classification, the core idea of orthogonal decomposition should apply to other modalities where classifiers operate on penultimate features, suggesting promising directions for NLP and other domains.

AI Analysis

CORE represents a conceptually clean advance in OOD detection by explicitly disentangling two complementary signals that were previously entangled. The orthogonal decomposition insight is mathematically elegant and explains why previous methods showed inconsistent performance across architectures: they were trying to measure membership in a space contaminated by confidence signals. Practitioners should note that CORE's effectiveness depends on the quality of the orthogonal decomposition. In architectures with non-linear final layers or complex head structures, the assumption of a simple linear classifier on top of penultimate features might not hold perfectly. The paper shows it works well on standard architectures, but custom architectures may require adaptation. The residual score's success suggests that modern neural networks encode more distributional information than what's strictly needed for classification. This has implications beyond OOD detection—similar orthogonal decompositions could be useful for domain adaptation, anomaly detection, or understanding what models learn versus what they use for predictions. From an implementation perspective, CORE's minimal computational overhead makes it immediately practical. Teams deploying models in safety-critical applications should benchmark it against their current OOD detection methods, as it provides robustness across architectures without significant performance cost. The method's simplicity also means it can be easily integrated into existing inference pipelines without architectural changes.
Original sourcearxiv.org

Trending Now

More in AI Research

Browse more AI articles