Professional Vocabulary Quick Reference


Frequently Confused Terms

Term ATerm BKey Difference
ParameterHyperparameterParameters are learned during training (weights, biases). Hyperparameters are set BEFORE training (learning rate, batch size, number of layers).
OverfittingUnderfittingOverfitting = model too complex (memorises noise). Underfitting = model too simple (can't capture patterns).
Bias (statistical)Bias (in neurons)Statistical bias = systematic error from simplifying assumptions. Neuron bias = a constant term added before activation.
Multi-classMulti-labelMulti-class = exactly ONE class per input (softmax). Multi-label = MULTIPLE classes per input possible (sigmoid).
Validation setTest setValidation = used during training to tune hyperparameters. Test = used ONCE at the end to evaluate final performance.
EpochBatchEpoch = one complete pass through ALL training data. Batch = a subset of data processed before one weight update.
RegularisationNormalisationRegularisation = technique to prevent overfitting (L1, L2, dropout). Normalisation = scaling data or activations (batch norm, standardisation).
Feature mapFilter/KernelFilter = the small weight matrix that slides across input. Feature map = the OUTPUT produced after applying a filter.
StridePaddingStride = how many pixels the filter moves each step. Padding = adding zeros around the input border.
Valid paddingSame paddingValid = no padding (output shrinks). Same = pad so output spatial dimensions = input.
EncoderDecoderEncoder = processes input into representation. Decoder = generates output from representation.
Self-attentionCross-attentionSelf-attention = input attends to itself. Cross-attention = one sequence attends to another (e.g., decoder attends to encoder).
PrecisionRecallPrecision = of predicted positives, how many are correct. Recall = of actual positives, how many did we find.

Key Terms by Topic

Data Preprocessing

TermChineseDefinition
Imputation填补/插补Replacing missing values with estimated values
Standardisation标准化Transform to mean=0, std=1: (x-μ)/σ
Normalisation归一化Scale to range [0,1]: (x-min)/(max-min)
One-hot encoding独热编码Binary vector representation for categories
Outlier异常值/离群值Data point far from the rest of the distribution
Feature engineering特征工程Creating new features from raw data

Neural Networks

TermChineseDefinition
Activation function激活函数Non-linear function applied after linear transformation
Backpropagation反向传播Algorithm to compute gradients by chain rule
Gradient descent梯度下降Iterative optimisation by following negative gradient
Learning rate学习率Step size for gradient descent updates
Loss function损失函数Measures how wrong the model's predictions are
Weight initialisation权重初始化Setting initial values for model parameters
Vanishing gradient梯度消失Gradients become extremely small in deep networks
Exploding gradient梯度爆炸Gradients become extremely large in deep networks

CNN

TermChineseDefinition
Convolution卷积Sliding a filter across input to produce feature map
Pooling池化Downsampling feature maps (max or average)
Kernel/Filter卷积核/滤波器Small weight matrix that detects patterns
Stride步幅Number of pixels the filter moves each step
Padding填充Adding zeros around input borders
Feature map特征图Output of applying a filter to input
Receptive field感受野Region of input that affects a particular output neuron

Transformer

TermChineseDefinition
Self-attention自注意力Each position attends to all other positions in the sequence
Multi-head attention多头注意力Multiple parallel attention functions with different projections
Positional encoding位置编码Signal added to embeddings to encode sequence order
Masked attention掩码注意力Prevents attending to future positions in decoder
Query (Q)查询"What am I looking for?"
Key (K)"What do I contain?"
Value (V)"What information do I provide?"
[CLS] token分类标记Special token in ViT that aggregates information for classification

Regularisation & Training

TermChineseDefinition
L1 regularisation (Lasso)L1正则化Adds
L2 regularisation (Ridge)L2正则化Adds weight² penalty → shrinks all weights toward 0
Dropout随机失活Randomly deactivates neurons during training to prevent co-adaptation
Early stopping提前停止Stop training when validation loss stops improving
Batch normalisation批量归一化Normalises activations per mini-batch (zero mean, unit variance)
Weight decay权重衰减Equivalent to L2 regularisation in most optimisers

Optimisation

TermChineseDefinition
SGD随机梯度下降Updates weights using gradient of a random mini-batch
Momentum动量Accumulates past gradients to smooth and accelerate updates
Adam自适应矩估计Adaptive per-parameter learning rate using 1st and 2nd moment estimates
Learning rate schedule学习率调度Changing learning rate during training (e.g., exponential decay)
Convergence收敛When the loss reaches a stable minimum value
Gradient clipping梯度裁剪Caps gradient magnitude to prevent exploding gradients

RNN / Sequence Models

TermChineseDefinition
Hidden state隐藏状态Internal memory vector passed between time steps in RNN
LSTM长短时记忆网络RNN variant with gates (forget, input, output) to control information flow
GRU门控循环单元Simplified LSTM with 2 gates (reset, update) instead of 3
Forget gate遗忘门Decides what information to discard from cell state
Sequential processing顺序处理Processing tokens one at a time (advantage: captures order; drawback: can't parallelise)
Teacher forcing教师强迫Using ground truth as decoder input during training instead of previous predictions

Evaluation

TermChineseDefinition
Confusion matrix混淆矩阵Table showing TP, TN, FP, FN counts
True Positive (TP)真阳性Correctly predicted as positive
False Positive (FP)假阳性Incorrectly predicted as positive (Type I error)
False Negative (FN)假阴性Incorrectly predicted as negative (Type II error)
True Negative (TN)真阴性Correctly predicted as negative
Class imbalance类别不平衡Unequal distribution of classes in dataset

Commonly Misspelled Words

WrongCorrect
regularizationregularisation (NZ/UK spelling used in exam)
optimzationoptimisation
occuredoccurred
seperatelyseparately
convultionconvolution
parallelisecorrect as-is (NZ spelling)
acheiveachieve
independantindependent
artificalartificial

Note: This is a New Zealand university — British/NZ spelling is expected (regularisation, normalisation, optimisation), not American spelling.


考试高频搭配(Collocations for Exams)

英文不是一个词一个词写的,是一组一组搭配着用的。背搭配比背单词更有效。

动词 + 名词 搭配

中文正确搭配错误搭配
应用正则化apply regularisationuse regularisation (可以但不够学术)
计算梯度compute the gradientcalculate the gradient (也对,但 compute 更常用)
训练模型train the modellearn the model
调整超参数tune hyperparametersadjust hyperparameters (也对但 tune 更地道)
提取特征extract featuresget features
缓解过拟合mitigate overfittingreduce the overfit
收敛到最优值converge to the optimumreach to the optimum
惩罚大权重penalise large weightspunish big weights
丢弃信息discard informationthrow away the information
执行特征选择perform feature selectiondo feature selection

形容词 + 名词 搭配

中文正确搭配不太好的说法
类别不平衡class imbalanceunbalanced classes
过拟合的模型model that overfitsoverfitted model (也对但动词形式更常用)
自适应学习率adaptive learning rateautomatic learning rate
稀疏表示sparse representationfew-value representation
鲁棒的robust to outliersstrong against outliers
可泛化的generalisablecan be generalised (形容词更简洁)

常用介词搭配

中文正确搭配常见错误
在验证集上表现好perform well on the validation setin the validation set
对异常值鲁棒robust to outliersrobust for outliers
收敛到一个值converge to a valueconverge at a value
在...方面优于outperform [X] in terms ofoutperform [X] at
防止过拟合prevent overfitting (动名词)prevent to overfit
有助于泛化help with generalisationhelp to generalise (两者都对)