Bias-Variance Tradeoff & Design Choices

Exam Importance

MUST | The single most tested topic: 4 questions across all exams, ~20 marks total


Feynman Draft

Imagine you're learning to throw darts at a bullseye.

  • High bias(高偏差) = you consistently miss in the same direction. Your aim is systematically off. You're too rigid — like using only your wrist instead of your whole arm. This is underfitting(欠拟合) — your model is too simple to capture the real pattern.

  • High variance(高方差) = your throws are scattered all over the board. Sometimes you hit the bullseye, sometimes the wall. You're too sensitive to tiny movements. This is overfitting(过拟合) — your model memorises the training data noise instead of learning the real pattern.

How do you diagnose this from training curves?

What You SeeDiagnosisName
Training accuracy HIGH, Validation accuracy LOWHigh varianceOverfitting
Training accuracy LOW, Validation accuracy LOWHigh biasUnderfitting
Training accuracy HIGH, Validation accuracy HIGHGood fit!Keep it

Toy Example with Numbers:

ScenarioTrain AccVal AccDiagnosisWhat to Do
A95%60%Overfitting(过拟合)Regularisation, more data
B50%50%Underfitting(欠拟合)Bigger model, remove regularisation
C92%88%Good fitShip it

Common Misconception: "If validation accuracy is low, always add regularisation." WRONG! Regularisation helps overfitting (A), but makes underfitting (B) even WORSE because it constrains the model further.

Core Intuition: Bias(偏差) = model too simple for the problem. Variance(方差) = model too complex for the data amount.


The Design Choices Decision Tree (EXAM ESSENTIAL)

This is the teacher's favourite question format. Memorise this:

Step 1: DIAGNOSE
  Train >> Val? → Overfitting (high variance)
  Train ≈ Val ≈ low? → Underfitting (high bias)

Step 2: PRESCRIBE
  If OVERFITTING(过拟合):
    ✅ Regularisation(正则化) (L1, L2, Dropout) → constrains model complexity
    ✅ More/diverse training data → helps generalise(泛化)
    ✅ Data augmentation(数据增强) → more variety without new data
    ✅ Batch normalisation(批量归一化) → regularising effect
    ✅ Early stopping(提前停止) → stop before overfitting
    ✅ Reduce model size → less capacity to memorise
    ❌ More epochs → makes it WORSE
    ❌ Bigger model → makes it WORSE

  If UNDERFITTING(欠拟合):
    ✅ Increase model size (more layers/neurons) → more capacity(容量)
    ✅ Train longer → give it time to learn
    ✅ Remove/reduce regularisation → stop constraining
    ✅ Better features / more data → more signal
    ✅ Transfer learning(迁移学习) → start from pretrained model
    ❌ Regularisation → makes it WORSE
    ❌ Dropout → makes it WORSE
    ❌ Smaller model → makes it WORSE

Past Exam Questions with Answer Logic

2024 Q2 [6 marks] — Overfitting Scenario

Setup: 5 hidden layers, ReLU, 20 neurons/layer, 1000 epochs. Train=95%, Val=60%.

SuggestionAnswerReasoning
Train for 2000 epochsNOAlready overfitting → more training = memorise more noise
Larger datasetYESMore diverse data helps learn general patterns, not noise
L2 regularisationYESPenalises large weights → simpler, more generalisable model

Practice Q3 [6 marks] — Underfitting Scenario

Setup: 2 hidden layers, ReLU, 5 neurons/layer, 2000 epochs, L1 regularisation. Train=50%, Val=50% (achievable=95%).

SuggestionAnswerReasoning
Increase network sizeYESUnderfitting = model too small → need more capacity
Initialise weights to 0NOCreates symmetry → all neurons learn identical things → can't differentiate features
Use dropoutNODropout is regularisation → fights overfitting, not underfitting

2025 Q2 [3 marks] — Curve Interpretation

Setup: Training curves after 20 epochs showing gap between train/val accuracy and diverging loss curves.

(a) Diagnose: High variance (overfitting) — clear gap between training and validation. Possibly also high bias if training loss is still high.

(b) Two changes (each targeting different aspect):

  • Regularisation (e.g., L2, dropout) → reduces overfitting
  • Data augmentation → more varied training data → better generalisation
  • Batch normalisation → has regularising effect
  • Increase model size (if bias is high) → more capacity to fit

How to Read Training Curves

Loss Curve Diagnosis — all exam scenarios

Quick reference table:

What you see on the plotDiagnosisFix
Train loss ↓, Val loss ↑ after a pointOverfitting(过拟合) (high variance)Dropout, L2, more data, early stop
Both losses stay HIGHUnderfitting(欠拟合) (high bias)Bigger model, more epochs, less regularisation
Loss oscillates / explodesLR too highReduce LR ×10, use scheduler
Both losses barely moveLR too lowIncrease LR, use warm-up
Both losses ↓ and convergeGood fitKeep going or early stop

English Expression Templates

Diagnosing:

  • "The model displays high variance as there is a clear gap between training and validation accuracy."
  • "This indicates overfitting, where the model fits the training data too closely but fails to generalise."

Prescribing:

  • "Applying regularisation can help reduce overfitting by limiting model complexity."
  • "Training on a larger dataset might help the model learn more general patterns."
  • "This will not help because the model is already underfitting — adding regularisation would constrain it further."

中文思维 → 英文输出

你脑中的中文想法考试中应该写的英文
过拟合了,训练高验证低"The model is overfitting — the training accuracy (X%) is significantly higher than the validation accuracy (Y%)."
欠拟合,两个都很低"The model is underfitting, as both training and validation accuracies are low, indicating insufficient model capacity."
加正则化能改善"Applying regularisation is likely to improve validation accuracy by constraining model complexity."
不能再多训练了,会更差"Training for more epochs will not help — it is likely to worsen overfitting as the model continues to memorise training noise."
dropout不能解决欠拟合"Dropout will not help because the model is underfitting. Dropout reduces effective capacity, which would worsen the problem."
模型太简单了,学不到东西"The model lacks sufficient capacity to capture the underlying patterns in the data."
需要更多数据来泛化"Increasing the dataset size is likely to help the model generalise better by providing more diverse examples."
权重初始化为0不行"Initialising all weights to zero creates symmetry — all neurons learn identical features, preventing the network from differentiating."

本章 Chinglish 纠正

Chinglish (avoid)Correct English
"The model is overfit""The model is overfitting" (use progressive form for the state)
"It should add regularisation""Applying regularisation would help"
"The gap is too big""There is a significant discrepancy between training and validation performance"
"More data can solve""Increasing the dataset size is likely to help the model generalise better"
"The model is not enough complex""The model has insufficient capacity"
"Train more epoch will be worse""Training for more epochs is likely to worsen overfitting"

Whiteboard Self-Test

  • Can you draw the bias-variance diagnosis table from memory?
  • Given train=95%/val=55%, what's the diagnosis? What 3 things help?
  • Given train=50%/val=50%, what's the diagnosis? Why does dropout NOT help?
  • Can you explain why zero weight initialisation is bad?
  • Can you explain why more epochs worsens overfitting?