This exam is written and answered in English. These templates help you write clear, precise, high-scoring answers.
[YES/NO], [suggestion] is [likely/unlikely] to improve validation accuracy.
This is because the current model is [overfitting/underfitting], as evidenced by
[training accuracy being much higher/lower than validation accuracy].
[Suggestion] works by [mechanism], which [helps/does not help] with
[overfitting/underfitting] because [specific reason].
Example:
"Yes, L2 regularisation is likely to improve validation accuracy. The current model is overfitting, as the training accuracy (95%) is much higher than the validation accuracy (60%). L2 regularisation penalises large weights, encouraging the model to learn a simpler, more generalisable representation, which helps reduce overfitting."
[Concept] is [one-sentence definition].
It works by [mechanism in 1-2 sentences].
This is beneficial/important because [why it matters].
The [metric] is [value], which indicates [what this means].
However, [other metric] reveals that [deeper insight].
This is because [explanation of model behavior, e.g., class imbalance,
predicting everything as one class].
While [A] [feature of A], [B] [feature of B].
The key advantage of [A] is [advantage], whereas [B] excels at [advantage].
However, [A]'s main drawback is [drawback], which [B] addresses by [solution].
"The use of [median/most frequent] imputation suggests the data is [numerical/categorical] with missing values."
"The standardisation step indicates that features are on different scales."
"The log transformation suggests the data has a heavy-tailed distribution."
"Removing the attribute is justified because [X]% of values are missing, and imputation would create misleading information."
"Outlier removal is appropriate because the maximum value ([X]) is significantly larger than expected given the mean ([Y]) and standard deviation ([Z])."
"The model displays high variance, as there is a clear gap between the training and validation [accuracy/loss]."
"This indicates overfitting, where the model fits the training data too closely but fails to generalise."
"The model appears to have high bias, as both training and validation accuracies are low."
"This suggests underfitting — the model is not complex enough to capture the underlying patterns."
"Applying regularisation can help reduce overfitting by limiting the complexity of the model."
"Increasing the model size may help address underfitting by giving the model more capacity to learn."
"The output dimensions of the convolutional layer are calculated as: floor((n + 2p - f) / s) + 1."
"With valid padding (p=0), the spatial dimensions of the output will be smaller than the input."
"With same padding, the output spatial dimensions match the input spatial dimensions when stride is 1."
"The depth of the output equals the number of filters applied."
"Pooling reduces the spatial dimensions while preserving the depth."
"Max pooling and average pooling produce output with the same dimensions; only the values differ."
"The masking in the decoder prevents each position from attending to future tokens."
"This preserves the autoregressive property, ensuring that predictions depend only on previously generated tokens."
"Multi-head attention runs several attention functions in parallel, each with its own learned weight matrices."
"This allows the model to focus on different aspects of the input simultaneously."
"The [CLS] token in ViT aggregates information from all image patches for the final classification."
"Positional encoding is necessary because the Transformer processes all positions in parallel, losing inherent ordering."
"A diverging loss curve indicates a learning rate that is too high, causing the optimisation to overshoot."
"A very slowly decreasing loss suggests the learning rate is too small."
"Learning rate scheduling, such as exponential decay, allows faster initial convergence while avoiding overshooting near the optimum."
"The momentum mechanism maintains an exponentially decaying average of past gradients, smoothing the optimisation trajectory."
"The accuracy is calculated as (TP + TN) / (TP + TN + FP + FN) = ..."
"Despite the seemingly acceptable accuracy, the model performs poorly at identifying positive instances."
"The high recall but low precision indicates the model predicts most instances as positive."
"This discrepancy highlights the importance of examining metrics beyond accuracy, particularly with imbalanced datasets."
"The dying ReLU problem occurs when neurons consistently receive negative inputs, causing them to output zero and stop learning."
"LeakyReLU mitigates this by introducing a small positive slope for negative inputs."
"For multi-label classification, sigmoid is the appropriate output activation because each output is treated independently."
"Softmax is unsuitable for multi-label problems because it forces all outputs to sum to 1."
"Batch normalisation speeds up training by normalising activations within each mini-batch."
"It reduces the risk of vanishing and exploding gradients by keeping activations in a healthy range."
"The normalisation over mini-batches introduces noise, which has a regularising effect."
"It also reduces sensitivity to weight initialisation by automatically adjusting activation distributions."
"L2 regularisation penalises large weights, encouraging the model to learn a simpler, more generalisable representation."
"L1 regularisation drives some weights to exactly zero, performing automatic feature selection."
"Dropout randomly deactivates neurons during training, forcing the network to learn redundant, distributed representations."
"This prevents co-adaptation, where specific neurons become overly reliant on each other."
"Early stopping halts training when validation loss stops improving, preventing the model from memorising training noise."
"Regularisation constrains model complexity, which helps when the model is overfitting but worsens underfitting."
"RNNs process tokens sequentially, naturally capturing temporal order without additional mechanisms."
"However, sequential processing prevents parallelisation, making training slow for long sequences."
"Vanilla RNNs suffer from vanishing gradients because gradients are multiplied through many time steps."
"LSTM mitigates this by introducing a cell state and gating mechanisms (forget, input, output gates) that control information flow."
"The forget gate decides what information to discard, while the input gate controls what new information to store."
"GRU simplifies LSTM by combining the forget and input gates into a single update gate, reducing the number of parameters."
中国学生最常犯的逻辑问题不是内容错误,而是句子之间缺少逻辑连接。下面的连接词能让你的答案读起来像母语者写的。
中文逻辑 英文表达 用法示例
因为...所以... This is because... As a result, ... "This is because the learning rate is too high. As a result, the loss diverges."
由于... Due to... / Owing to... "Due to class imbalance, accuracy is misleading."
导致了... This leads to... / This causes... "This leads to vanishing gradients in early layers."
...的原因是... The reason [X] is that... "The reason overfitting occurs is that the model has too many parameters relative to the data."
中文逻辑 英文表达 用法示例
但是/然而 However, ... / Nevertheless, ... "However, this approach fails when the model is underfitting."
虽然...但是... Although/While [X], [Y] "While accuracy appears high at 70%, the recall of only 33% reveals poor performance."
相反 In contrast, ... / Conversely, ... "In contrast, the Transformer processes all positions in parallel."
尽管如此 Despite this, ... "Despite the high accuracy, the model performs poorly on the minority class."
中文逻辑 英文表达 用法示例
而且/此外 Furthermore, ... / Moreover, ... / Additionally, ... "Furthermore, batch normalisation has a regularising effect."
具体来说 Specifically, ... / In particular, ... "Specifically, dropout randomly deactivates neurons during training."
换句话说 In other words, ... / That is, ... "In other words, the model has memorised the training noise."
中文逻辑 英文表达 用法示例
因此/所以 Therefore, ... / Thus, ... / Hence, ... "Therefore, L2 regularisation is likely to improve validation accuracy."
总之 In summary, ... / Overall, ... "Overall, the model is overfitting and would benefit from regularisation."
综上所述 Based on the above analysis, ... "Based on the above analysis, the suggestion is unlikely to help."
Wrong Right Why
"The model has a good performance" "The model performs well" More natural
"It can help to improve the accuracy" "It is likely to improve the accuracy" More academic
"The reason is because..." "This is because..." Redundant construction
"More bigger model" "A larger model" Comparative form
"The data have..." "The data have..." or "The dataset has..." Both acceptable
"Prevent to overfit" "Prevent overfitting" Gerund after "prevent"
"It will for sure improve" "It is likely to improve" Hedge appropriately
Instead of Use
"better" "more generalisable", "more robust"
"bad" "suboptimal", "poor", "degraded"
"the gap" "the discrepancy between training and validation"
"too good at training data" "fits the training data too closely"
"learns too much" "memorises noise in the training data"
"doesn't work well" "fails to generalise to unseen data"
"makes the model simpler" "constrains model complexity"
"helps with overfitting" "has a regularising effect"