CORE OOD Detection Method Achieves SOTA on 3 of 5 Benchmarks by Disentangling Confidence and Residual Signals
AI ResearchScore: 75

CORE OOD Detection Method Achieves SOTA on 3 of 5 Benchmarks by Disentangling Confidence and Residual Signals

Researchers propose CORE, a new OOD detection method that scores classifier confidence and orthogonal residual features separately. It achieves the highest grand average AUROC across five architectures with negligible computational overhead.

14h ago·5 min read·5 views·via arxiv_ai
Share:

CORE: A Robust OOD Detection Method Using Orthogonal Feature Decomposition

Out-of-distribution (OOD) detection—identifying when a model encounters data unlike its training distribution—is a critical safety component for real-world deep learning deployment. Current methods suffer from inconsistent performance: a technique that excels on one model architecture or dataset often fails on another. A new paper, "CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring," proposes a solution by fundamentally rethinking how to extract detection signals from a neural network.

The Core Problem: Entangled Signals

The paper identifies a structural limitation shared by existing OOD detection approaches. These methods generally fall into two categories:

  1. Logit-based methods (e.g., Maximum Softmax Probability, ODIN): These use the classifier's final output (logits or softmax probabilities) as a confidence score. They only see the model's prediction certainty.
  2. Feature-based methods (e.g., Mahalanobis distance, Gram matrices): These analyze the model's internal activations (features) to measure how "close" a sample is to the training distribution.

The authors argue that logit-based methods miss a crucial signal: whether the input's features actually belong to the learned distribution. Feature-based methods attempt to capture this "membership" signal but do so in the full, high-dimensional feature space where it is entangled with the classifier's confidence signal and architectural noise. This entanglement makes their performance highly sensitive to model architecture.

What CORE Does: Disentangling Orthogonal Subspaces

The key insight of CORE is that the penultimate layer features (the layer before the final classification head) naturally decompose into two orthogonal components relative to the classifier.

(a) Near-OOD and far-OOD AUROC across five model×\timesID settings. Shaded bands span the best-to-worst scorer within ea

  1. The Classifier-Aligned Component: This is the projection of the feature vector onto the subspace spanned by the classifier's weight vectors. This component directly determines the logits and thus encodes the model's confidence signal.
  2. The Orthogonal Residual: This is the remaining part of the feature vector that is orthogonal to the classifier's subspace. The classifier explicitly discards this information. The authors discovered that for in-distribution (ID) data, this residual carries a consistent, class-specific directional signature—a pure membership signal.

CORE's innovation is to score these two orthogonal subspaces independently:

  • Confidence Score (S_conf): Derived from the classifier's output (e.g., softmax maximum).
  • Residual Score (S_res): A measure computed on the orthogonal residual features. The paper suggests using a simple Mahalanobis distance computed solely in this residual subspace.

These two scores are normalized and combined via a weighted sum to produce the final OOD score: S_CORE = α * S_conf + (1-α) * S_res.

Because the confidence and residual signals are orthogonal by construction, their failure modes are theoretically approximately independent. When one signal is unreliable (e.g., an OOD sample that yields high softmax confidence), the other can compensate, leading to robust detection.

Key Results

The authors evaluated CORE across a rigorous benchmark suite: five model architectures (DenseNet, ResNet, WideResNet, ViT, MLP-Mixer) and five OOD detection configurations (CIFAR-10 vs SVHN/TinyImageNet, etc.). Performance was measured using Area Under the Receiver Operating Characteristic curve (AUROC) and False Positive Rate at 95% True Positive Rate (FPR95).

(a) Near-OOD and far-OOD AUROC across five model×\timesID settings. Shaded bands span the best-to-worst scorer within ea

CORE achieved competitive or state-of-the-art performance, ranking first in 3 out of 5 benchmark settings and attaining the highest grand average AUROC across all architectures and datasets.

CIFAR-10 vs SVHN 99.3% (ReAct) 99.6% +0.3 pp CIFAR-10 vs TinyImageNet 95.1% (Mahalanobis) 96.7% +1.6 pp CIFAR-100 vs SVHN 98.5% (ReAct) 99.2% +0.7 pp CIFAR-100 vs TinyImageNet 87.9% (KNN) 88.1% +0.2 pp ImageNet-1K vs iNaturalist 91.5% (Mahalanobis) 92.8% +1.3 pp

Table: Selected AUROC results showing CORE's performance against strong baselines like ReAct, Mahalanobis distance, and K-Nearest Neighbors (KNN).

Crucially, CORE adds negligible computational overhead—primarily the cost of projecting features onto the orthogonal residual subspace and computing a distance—making it practical for real-time systems.

How It Works: Technical Implementation

For a model with penultimate feature h ∈ R^d and a linear classifier with weight matrix W ∈ R^(d×C) (for C classes) and bias b, the logits are z = W^T h + b.

Figure 2: Each column represents one scorer category: (a) logit-based energy score (far-OOD separates, near-OOD overlaps

The orthogonal decomposition is achieved as follows:

  1. Compute the Classifier Subspace Basis: Perform a QR decomposition on W to get an orthonormal basis Q ∈ R^(d×r) for the column space of W, where r is the rank of W (typically r = C or r = C-1).
  2. Project Features: The classifier-aligned component is h_align = Q Q^T h. The orthogonal residual is h_res = h - h_align = (I - Q Q^T) h.
  3. Score Calculation:
    • S_conf = maximum softmax probability (or energy score) from z.
    • S_res = Mahalanobis distance of h_res computed using the mean and covariance estimated from the training set's residual features. This is simplified because h_res lies in a (d-r)-dimensional subspace.
  4. Normalize & Combine: Min-max normalize each score over a held-out validation set, then combine with a fixed weight α (tuned on validation data).

The method is model-agnostic, requiring only access to the penultimate features and the final linear layer.

Why It Matters

CORE addresses a fundamental pain point in OOD detection: the lack of a robust, architecture-agnostic method. By leveraging a natural orthogonal decomposition present in any classifier, it provides a principled way to disentangle and combine confidence and membership signals. The results demonstrate consistent high performance without specialized tuning per architecture, moving closer to a "drop-in" OOD detection module. Its low computational cost makes it immediately applicable for enhancing the reliability of deployed vision models, from medical diagnostics to autonomous vehicles.

AI Analysis

CORE's primary contribution is a clever reframing of the feature space. The observation that the penultimate layer's residual (discarded by the classifier) contains a clean membership signal is significant. Prior feature-based methods like the Mahalanobis detector operated on the full feature space, where directions relevant for classification dominate the covariance. By restricting the membership test to the orthogonal complement of the classifier's subspace, CORE effectively filters out noise and focuses on a purer distributional signal. Theoretical robustness arises from the orthogonality of failure modes. A confident but distributionally alien sample (a common failure case for logit-based methods) will have an anomalous residual score. Conversely, a distributionally similar but low-confidence sample (a failure case for some feature methods) will be flagged by the low confidence score. This complementary design is more elegant than prior ensemble methods that heuristically combine scores from different layers or techniques. For practitioners, CORE is appealing due to its simplicity and low overhead. Implementing it requires adding a feature hook and performing a projection and distance calculation—far less costly than maintaining a separate model like an energy-based model or a flow. The need to compute the matrix `Q` for the classifier subspace is a one-time cost. The main tuning parameter is the combination weight `α`, which the paper finds can often be set to 0.5 or tuned on a small validation set. This makes it a strong candidate for a new default baseline in OOD detection benchmarks.
Original sourcearxiv.org

Trending Now

More in AI Research

Browse more AI articles