Back to Knowledge Base
YOLO Training Deep Learning YOLOv8 GPU

Training YOLOv8 for Industrial Defect Detection: Parameters, VRAM, and Pitfalls

A practical guide to YOLOv8 training parameters in VisionLab โ€” VRAM selection table, small-dataset tips, and a deep-dive into the close_mosaic loss explosion bug and its fix.

March 5, 20264 min read

Getting a YOLOv8 model to converge cleanly on an industrial defect dataset requires tuning a handful of parameters correctly. This post covers the training tab in VisionLab, what each parameter does, and two non-obvious bugs you'll hit on small datasets.

VRAM Selection Table

The single most important decision is which model variant to use given your GPU. More VRAM lets you use a bigger model and larger input resolution:

VRAMModelInput sizeBatch sizeTypical GPU
CPU / < 4 GBn6404Dev/debug
4โ€“6 GBn6408GTX 1060
6โ€“8 GBn64016RTX 2060
8โ€“12 GBs64016RTX 3060
12โ€“16 GBs12808RTX 3080
16โ€“24 GBm128016RTX 3090
โ‰ฅ 24 GBl128032A100

VisionLab detects your GPU at startup and fills in the recommended defaults automatically.

Key Parameters

ParameterDefaultNotes
Epochs200For < 200 training images: use 200โ€“300. For > 1000: 100 is often enough.
Batch sizeAutoMust not exceed training set size. If batch > dataset, YOLOv8 silently truncates.
OptimizerAdamAdam converges faster. SGD + momentum gives slightly better generalisation on large datasets.
LR0.001Adam: 1e-3. SGD: 0.01.
LR final0.0001Cosine decay endpoint. Do not set to 0 โ€” it causes the last few epochs to train with near-zero gradient.
Weight decay0.0005Regularisation. Increase to 0.001 for < 100 training images to reduce overfitting.
Val split0.20Keep at 0.15โ€“0.20. Less than 10 validation images makes mAP metrics unreliable.

Small Dataset Checklist (< 100 images)

  • Tile first: Use tile_dataset.py to convert 4K frames into 640-px crops. One 4K image โ†’ 30โ€“50 training samples.
  • Batch size = min(training_count ร— 0.8, 8) โ€” never exceed training set count.
  • Epochs = 300โ€“500 โ€” small datasets need more passes.
  • Model = nano or small only โ€” medium/large will overfit immediately.
  • Watch val_loss: if it rises steadily after epoch 50, stop early. best.pt is saved at the lowest val_loss automatically.

Training Output Files

output_dir/
  train.log          โ† full log with timestamps
  best.pt            โ† saved at lowest val_loss (use this for deployment)
  epoch_10.pt        โ† periodic checkpoint every save_period epochs
  epoch_20.pt
  ...

Always deploy best.pt, not the last epoch โ€” the final epoch is often slightly overfit relative to the val-loss minimum.

The close_mosaic Loss Explosion Bug

YOLOv8 uses mosaic augmentation (4 images stitched together) for most of training, then disables it for the last close_mosaic epochs to let the model adapt to single images before evaluation.

On small datasets this causes a catastrophic loss spike:

Epoch 190/200  train=3.48  val=5.0   mAP50=0.826  โ† normal
Epoch 191/200  train=7.17  val=5.2                 โ† mosaic disabled
Epoch 192/200  train=13.99 val=5.6   mAP50=0.64   โ† explosion
Epoch 200/200  train=14.42 val=8.46  mAP50=0.39   โ† model destroyed

Root cause: The model was trained exclusively on 4-image mosaic batches. When close_mosaic switches to single images, the per-batch foreground count drops from ~2560 to ~480. The target_scores_sum normalisation denominator collapses from ~1792 to ~1.0, amplifying the classification loss by ร—1792 in a single step. The resulting gradient destroys the classifier head before it can adapt.

Fix in VisionLab: close_mosaic is disabled by default (close_mosaic = 0). best.pt is always saved before any close_mosaic phase would trigger โ€” the val-loss minimum on small datasets reliably falls in epochs 50โ€“100, well before epoch 190.

For datasets with > 500 images you can re-enable it (close_mosaic = 10), which mirrors the Ultralytics default behaviour.

Reading the Training Log

Epoch [050/200]  train=2.14  val=3.82  mAP50=0.741  mAP50-95=0.412
  • train loss should decrease monotonically through epoch ~100
  • val loss decreasing = generalising; rising = overfitting โ€” stop here
  • mAP50 = mean Average Precision at IoU 0.5. Production target: โ‰ฅ 0.85 for single-class defect detection
  • mAP50-95 = stricter metric (IoU 0.5 to 0.95 average). Less important for defect detection where any overlap counts

Once best.pt is saved, export to ONNX for deployment or use it for AI-assisted auto-labelling of new images.