Back to Case Studies
Defect Detection

Zero-Shot Surface Defect Detection with PatchCore

Deploying VisionLab's DINOv2 + PatchCore anomaly detection on a metal casting line โ€” trained on 8 normal-sample images, no defect images required.

April 10, 2026
PatchCore DINOv2 Anomaly Detection Deep Learning

The Problem with Traditional Defect Detection

Collecting labelled defect images is the biggest bottleneck in industrial vision. Defects are rare by design โ€” a good process produces very few. Building a supervised classifier requires months of sample collection before you can even start training.

VisionLab's PatchCore module inverts this requirement: train only on normal (good) images. The system memorises what "normal" looks like, then flags any test image that deviates from that memory.

How PatchCore Works

Training (one-time, ~2 min for 10 images):
  Normal images โ†’ DINOv2 patch features โ†’ Coreset sampling โ†’ Memory Bank (.pt)

Inference (< 50 ms per image):
  Test image โ†’ DINOv2 features โ†’ Nearest-neighbour distance to Memory Bank
                                         โ†“
                               Per-patch anomaly score โ†’ Heatmap overlay

DINOv2 as Backbone

DINOv2-S is a Vision Transformer pre-trained via self-supervised learning on 142 M images. It extracts spatially-aware, semantically rich patch features without any task-specific fine-tuning.

For a 224ร—224 input, the model outputs a [256, 384] feature matrix โ€” 256 patch positions, each described by a 384-dimensional vector.

Coreset Subsampling

Raw feature matrices from 10 training images contain ~2,560 vectors. Storing all of them would make nearest-neighbour search slow. Greedy coreset sampling retains only 10% of vectors while preserving full coverage of the feature space:

Sโˆ—=argโกminโกSโІF,โ€‰โˆฃSโˆฃ=KmaxโกfโˆˆFminโกsโˆˆSโˆฅfโˆ’sโˆฅ2S^* = \arg\min_{S \subseteq \mathcal{F},\, |S|=K} \max_{f \in \mathcal{F}} \min_{s \in S} \|f - s\|_2

Anomaly Score

For each patch qq in the test image, the anomaly score is the cosine distance to its nearest neighbour in the Memory Bank M\mathcal{M}:

score(q)=minโกmโˆˆM(1โˆ’qโ‹…mโˆฅqโˆฅโ€‰โˆฅmโˆฅ)\text{score}(q) = \min_{m \in \mathcal{M}} \left(1 - \frac{q \cdot m}{\|q\|\,\|m\|}\right)

Scores are spatially upsampled and Gaussian-smoothed to produce a human-readable heatmap.

Deployment Results

MetricValue
Training images (normal only)8
Training time~2 min (CPU)
AUROC on test set97.4%
Inference time< 50 ms / image
Defect image requirementZero

Why This Matters

The factory started inspection on the same day VisionLab was installed. No defect sample collection, no annotation, no model training from scratch. The Memory Bank was built from 8 images of known-good castings taken during a single production run.

This is the key advantage: you can inspect a brand-new product on day one.

Tech Stack

C++LibTorchDINOv2FAISSOpenCVVisionLab Plugin SDK