Soft Computing — Fuzzy Logic, Bayesian Reasoning & Naive Bayes (W5L1)
🎯 考试重要度
🔴 必考 | ~20% of exam | Appears in S1 2024, S1 2025, S1 2026 Sample
| Year | Question | Marks | Topic |
|---|---|---|---|
| S1 2026 Sample | Q6 | 4m | Classify 4 scenarios as vagueness vs uncertainty |
| S1 2025 Actual | Q5 | 3m | Contrast traditional logic vs fuzzy logic (hammer thrower) |
| S1 2024 Final | Q5 | ~3m | Naive Bayes assumptions (conditional independence + feature relevance) |
This chapter covers three exam-critical skills: (1) classifying vagueness vs uncertainty, (2) fuzzy logic computation, and (3) Bayesian / Naive Bayes calculation. All three have appeared in recent exams and are extremely likely to appear again.
📖 核心概念(Core Concepts)
| English Term | 中文 | One-line Definition |
|---|---|---|
| Hard Computing(硬计算) | 硬计算 | Computation using crisp symbols, exact values, binary true/false — compilers, arithmetic, classical logic |
| Soft Computing(软计算) | 软计算 | Computation tolerating imprecision, partial truth, and degrees — fuzzy logic, Bayes, neural nets |
| Vagueness(模糊性) | 模糊性(语义模糊) | The concept itself has blurry boundaries — “tall”, “warm”, “high risk” have no sharp cutoff |
| Uncertainty(不确定性) | 不确定性 | The state of the world is unknown — a definite fact exists but we lack evidence to know it |
| Fuzzy Set(模糊集合) | 模糊集合 | A set where membership is a degree in [0, 1], not binary {0, 1} |
| Membership Function $\mu_A(x)$(隶属度函数) | 隶属度函数 | Maps an element $x$ to its degree of belonging to fuzzy set $A$, valued in $[0, 1]$ |
| Fuzzy Connectives(模糊逻辑联结词) | 模糊算子 | AND = min, OR = max, NOT = 1 - $\mu$ |
| Fuzzy Implication(模糊蕴含) | 模糊蕴含 | Standard: $A \rightarrow B = \max(1-A, B)$; Godel: $1$ if $A \leq B$, else $B$ |
| Fuzzy Control(模糊控制) | 模糊控制 | Control system using fuzzy rules with error $e(t)$ and rate of change $\Delta e(t)$ as inputs |
| Defuzzification(去模糊化) | 去模糊化 | Converting a fuzzy output set back to a single crisp value (e.g., centre of gravity method) |
| Bayes’ Theorem(贝叶斯定理) | 贝叶斯定理 | $P(H \mid e) = P(e \mid H) \cdot P(H) / P(e)$ — updating belief with evidence |
| Prior $P(H)$(先验概率) | 先验概率 | Probability of hypothesis before observing evidence |
| Likelihood $P(e \mid H)$(似然) | 似然 | Probability of observing evidence given the hypothesis is true |
| Posterior $P(H \mid e)$(后验概率) | 后验概率 | Updated probability of hypothesis after observing evidence |
| Base Rate Fallacy(基率谬误) | 基率谬误 | Ignoring the prior probability (base rate) when interpreting evidence |
| Naive Bayes Classifier(朴素贝叶斯分类器) | 朴素贝叶斯 | Classifier assuming conditional independence: $P(C \mid \mathbf{x}) \propto P(C) \prod P(x_i \mid C)$ |
| Conditional Independence(条件独立) | 条件独立 | Features are independent of each other given the class label |
| Log-score(对数得分) | 对数得分 | $\arg\max [\log P(C) + \sum \log P(x_i \mid C)]$ — avoids numerical underflow |
🧠 费曼草稿(Feynman Draft)
Part 1: Why “Soft” Computing?
Imagine you are teaching a robot to drive. With hard computing, you would write rules like: “IF speed > 60 km/h THEN brake.” But what if speed is 59.9 km/h? The rule says “don’t brake,” even though that is essentially the same as 60. Hard computing treats the world as black-and-white, but the real world is full of shades of grey.
Soft computing is the toolkit for handling this greyness. It has three main branches:
- Fuzzy Logic — for concepts with blurry boundaries (“warm”, “fast”, “tall”)
- Bayesian Reasoning — for situations where we don’t know the truth (“Is it spam?”)
- Neural Networks — for learning patterns from data (covered in other chapters)
“Soft” does NOT mean weak or inferior. It means flexible enough to work when the world is messy — noisy data, vague concepts, incomplete information. A spam filter that says “92% likely spam” is far more useful than one that crashes because it can’t say “definitely spam” or “definitely not spam.”
| Feature | Hard Computing | Soft Computing |
|---|---|---|
| Representation | crisp symbols, exact values | approximate values, degrees, probabilities |
| Logic | true / false | partial truth or belief |
| Typical setting | well-defined rules, precise inputs | noisy, incomplete, vague, uncertain |
| Strength | exact answers when model is right | robust when world is messy |
| Examples | arithmetic, compilers, shortest path | fuzzy control, Bayes classifiers, neural nets |
Part 2: The Two Faces of “Not Knowing” — Vagueness vs Uncertainty
This is the single most important distinction in this chapter. The exam tests it directly (Q6, 4 marks). Here is the clearest way to understand it:
Uncertainty is like a locked box. There is a definite object inside — a red ball or a blue ball. You just don’t know which one. You can assign a probability: “70% chance it’s red.” The fact itself is crisp; your knowledge is incomplete.
Vagueness is like asking “Is this colour red?” while looking at an orange-red sunset. There is no hidden truth to discover. The concept “red” itself has blurry boundaries. You assign a degree: “This is red to degree 0.6.” There is no randomness — you can see the colour perfectly. The blurriness is in the word, not in the world.
Decision procedure for the exam:
Step 1: Is there a DEFINITE FACT about the world that we simply don't know?
→ YES → UNCERTAINTY (Bayesian reasoning)
→ NO → Go to Step 2
Step 2: Does the concept have BLURRY BOUNDARIES / admit degrees?
→ YES → VAGUENESS (Fuzzy logic)
→ NO → Standard hard computing
Worked examples (exam-style):
| Scenario | Hidden fact? | Blurry concept? | Answer |
|---|---|---|---|
| “This patient is high risk” | No — “high risk” is not a fact to discover | Yes — no sharp cutoff for “high risk” | VAGUENESS |
| Alarm went off — burglary? | Yes — burglar either came or didn’t | N/A | UNCERTAINTY |
| “Student 74 is almost excellent” | No — the grade 74 is known | Yes — “almost excellent” is graded | VAGUENESS |
| Spam filter classifying an email | Yes — email is either spam or not | N/A | UNCERTAINTY |
⚠️ Common Misconception: Many students confuse fuzzy membership with probability. When we say $\mu_{\text{Tall}}(183\text{cm}) = 0.6$, we are not saying “60% chance this person is tall.” The person IS 183cm — there’s no randomness. The 0.6 is a degree of truth about how well the concept “tall” applies. Fuzzy logic handles vagueness (blurry concepts); probability handles uncertainty (unknown facts).
Part 3: Building Intuition for Fuzzy Logic
Think of a dimmer switch for lights. A classical light switch is either ON or OFF — that is a classical (crisp) set. A dimmer switch lets you set any brightness from 0% to 100% — that is a fuzzy set.
When we say $\mu_{\text{Tall}}(183\text{cm}) = 0.6$, we mean: “183cm belongs to the set ‘Tall’ with degree 0.6.” No randomness, no probability. Just a graded concept.
Toy example with numbers:
Let $\mu_{\text{hot}} = 0.8$ and $\mu_{\text{humid}} = 0.7$. Then:
- Fuzzy AND: $\min(0.8, 0.7) = 0.7$ (the weakest link determines the conjunction)
- Fuzzy OR: $\max(0.8, 0.7) = 0.8$ (the strongest component determines the disjunction)
- Fuzzy NOT hot: $1 - 0.8 = 0.2$
Why min for AND? You are only as “both tall and heavy” as the lesser degree. If someone is tall (degree 0.9) but light (degree 0.2), they are “tall AND heavy” only to degree 0.2.
Why max for OR? “Tall OR heavy” is satisfied by the stronger of the two. If someone is tall (0.9) but not heavy (0.2), they are “tall OR heavy” to degree 0.9.
Part 4: Building Intuition for Bayesian Reasoning
Imagine a doctor diagnosing a rare disease. Before any test, the doctor knows: “This disease occurs in 1 out of 10,000 people” — that is the prior ($P(H) = 0.0001$). A test comes back positive, and the test catches 95% of true cases (sensitivity = 0.95). Many students instantly say “95% chance the patient has it!” But that is dead wrong.
Here is why: Out of 10,000 people, about 1 truly has the disease and tests positive. But about 100 healthy people also test positive (1% false positive rate of 9,999 people). So roughly 1 out of 101 positive tests is a true positive — about 1%, not 95%.
This is exactly what Bayes’ theorem captures: posterior $\propto$ prior $\times$ likelihood. A strong test on a rare event still produces mostly false positives. This is the base rate fallacy.
Part 5: Naive Bayes — Why “Naive” Works
Imagine you are sorting mail into “spam” and “not spam.” You look at each word independently: “FREE” suggests spam, “meeting” suggests not-spam. The “naive” part is assuming that seeing “FREE” tells you nothing about whether you will also see “WINNER” — each word is treated as an independent piece of evidence.
This assumption is obviously wrong (spam emails often contain both “FREE” and “WINNER” together). But Naive Bayes works anyway because:
- We only need the ranking of classes to be correct, not exact probabilities
- Many weak signals combine effectively even with independence errors
- Estimation is easy even with limited training data — no need to estimate complex joint distributions
💡 Core Intuition: Fuzzy asks “to what degree?” about blurry concepts; Bayes asks “how likely?” about unknown facts. Both tolerate imprecision — that is what makes them “soft.”
📐 正式定义(Formal Definition)
1. Hard Computing vs Soft Computing
| Feature | Hard Computing(硬计算) | Soft Computing(软计算) |
|---|---|---|
| Values | Crisp symbols, exact numbers | Approximate, degrees, partial truth |
| Truth model | Binary: True or False | Continuous: degrees in [0, 1], probabilities |
| Reasoning | Deductive, deterministic | Inductive, probabilistic, heuristic |
| Tolerance | No tolerance for imprecision | Tolerates and exploits imprecision |
| Examples | Classical logic, arithmetic, compilers, SQL | Fuzzy logic, Bayesian networks, neural networks |
| Strengths | Precision, provable correctness | Handling real-world ambiguity, noise, complexity |
| Limitations | Brittle with noisy/vague inputs | May sacrifice exactness for tractability |
2. Vagueness vs Uncertainty — Formal Distinction
| Dimension | Vagueness(模糊性) | Uncertainty(不确定性) |
|---|---|---|
| What is blurry? | The concept itself | Our knowledge of the world |
| The world | Fully observable — no hidden state | Has hidden state we cannot observe |
| The right question | “To what degree is this true?” | “How likely is this true?” |
| Output | Membership degree $\mu \in [0,1]$ | Probability $P \in [0,1]$ |
| Tool | Fuzzy Logic | Bayesian Reasoning |
| Sum constraint | Degrees do NOT need to sum to 1 | Probabilities MUST sum to 1 |
Critical point: $\mu_{\text{Tall}}(x) + \mu_{\text{Short}}(x)$ does NOT need to equal 1. A person can be “tall to degree 0.6” and “short to degree 0.2” simultaneously. But $P(\text{spam}) + P(\neg\text{spam})$ MUST equal 1.
3. Fuzzy Set Theory
Classical (crisp) set:
$$\mu_A(x) \in {0, 1}$$
An element either belongs ($1$) or does not ($0$). Sharp boundary.
Fuzzy set:
$$\mu_A: X \rightarrow [0, 1]$$
An element belongs with a degree between 0 and 1. No sharp boundary.
Example — fuzzy set “Tall”:
| Height (cm) | $\mu_{\text{Tall}}$ | Interpretation |
|---|---|---|
| 160 | 0.0 | Not tall at all |
| 170 | 0.2 | Barely tall |
| 175 | 0.4 | Somewhat tall |
| 180 | 0.6 | Moderately tall |
| 183 | 0.7 | Fairly tall |
| 190 | 0.9 | Very tall |
| 200 | 1.0 | Fully tall |
4. Fuzzy Logic Connectives
Given fuzzy truth values $A, B \in [0, 1]$:
Fuzzy AND (conjunction / 模糊合取):
$$A \wedge B = \min(A, B)$$
Fuzzy OR (disjunction / 模糊析取):
$$A \vee B = \max(A, B)$$
Fuzzy NOT (negation / 模糊否定):
$$\neg A = 1 - A$$
Properties preserved from classical logic:
| Property | Classical | Fuzzy |
|---|---|---|
| Commutativity | $A \wedge B = B \wedge A$ | $\min(A,B) = \min(B,A)$ ✓ |
| Associativity | $(A \wedge B) \wedge C = A \wedge (B \wedge C)$ | ✓ |
| De Morgan’s | $\neg(A \wedge B) = \neg A \vee \neg B$ | $1 - \min(A,B) = \max(1-A, 1-B)$ ✓ |
| Law of Excluded Middle | $A \vee \neg A = 1$ | $\max(A, 1-A) \neq 1$ in general ✗ |
| Law of Contradiction | $A \wedge \neg A = 0$ | $\min(A, 1-A) \neq 0$ in general ✗ |
⚠️ Important: Fuzzy logic violates the Law of Excluded Middle and Law of Contradiction. If $A = 0.5$, then $A \vee \neg A = \max(0.5, 0.5) = 0.5 \neq 1$ and $A \wedge \neg A = \min(0.5, 0.5) = 0.5 \neq 0$.
5. Fuzzy Implication
Two common definitions of $A \rightarrow B$:
Standard (Kleene-Dienes) implication:
$$A \rightarrow B = \max(1 - A, B)$$
This is the fuzzy analogue of the classical equivalence $A \rightarrow B \equiv \neg A \vee B$.
Godel implication:
$$A \rightarrow B = \begin{cases} 1 & \text{if } A \leq B \ B & \text{if } A > B \end{cases}$$
Complete comparison table:
| $A$ | $B$ | Standard: $\max(1-A, B)$ | Godel | More intuitive? |
|---|---|---|---|---|
| 0.5 | 0 | $\max(0.5, 0) = 0.5$ | $0$ (since $0.5 > 0$) | Godel — antecedent holds but consequent fails, so implication should fail |
| 0.8 | 0.3 | $\max(0.2, 0.3) = 0.3$ | $0.3$ (since $0.8 > 0.3$) | Same result |
| 0.6 | 0.9 | $\max(0.4, 0.9) = 0.9$ | $1$ (since $0.6 \leq 0.9$) | Godel — antecedent partially holds, consequent holds more, implication fully satisfied |
| 0.7 | 0.3 | $\max(0.3, 0.3) = 0.3$ | $0.3$ (since $0.7 > 0.3$) | Same result |
| 1.0 | 0.0 | $\max(0, 0) = 0$ | $0$ (since $1 > 0$) | Same — both give 0 for fully true antecedent, fully false consequent |
| 0.0 | 0.0 | $\max(1, 0) = 1$ | $1$ (since $0 \leq 0$) | Same — false antecedent makes implication vacuously true |
Key insight: The Godel version is generally more intuitive because when $A$ partially holds but $B$ does not hold at all, Godel correctly gives 0 (implication fails completely), while Standard gives a positive value.
6. Fuzzy Rules and Fuzzy Control
Fuzzy Rule format:
$$\text{IF } x \text{ is } A \text{ AND } y \text{ is } B \text{ THEN } z \text{ is } C$$
Fuzzy Control System Architecture:
┌──────────────┐
Crisp Input ──→ │ Fuzzification │ ──→ Fuzzy Input
└──────────────┘
│
▼
┌──────────────┐
Rule Base ──→ │ Inference │ ──→ Fuzzy Output
│ Engine │
└──────────────┘
│
▼
┌────────────────┐
│ Defuzzification │ ──→ Crisp Output
└────────────────┘
Fuzzy Control uses two key inputs:
- Error $e(t)$: difference between desired state and actual state
- Rate of change $\Delta e(t)$: how fast the error is changing
Example for temperature control:
- $e(t) = T_{\text{desired}} - T_{\text{actual}}$ (e.g., 22 - 25 = -3, meaning too hot)
- $\Delta e(t)$: is the temperature rising or falling?
- Rules like: IF $e(t)$ is negative-big AND $\Delta e(t)$ is positive THEN cooling is medium (it is too hot but getting better, so moderate cooling)
Applications (from lecture):
- Autopilot systems
- Anti-lock braking systems (ABS)
- Washing machines (adjust cycle based on “somewhat dirty”)
- Consumer devices
- Decision support systems
7. Bayes’ Theorem
$$P(H \mid e) = \frac{P(e \mid H) \cdot P(H)}{P(e)}$$
Where:
- $P(H)$ = prior(先验概率): belief in hypothesis before evidence
- $P(e \mid H)$ = likelihood(似然): probability of evidence given hypothesis is true
- $P(e)$ = evidence/marginal(边际概率): total probability of evidence under all hypotheses
- $P(H \mid e)$ = posterior(后验概率): updated belief after evidence
Expanding the denominator via the law of total probability:
$$P(e) = P(e \mid H) \cdot P(H) + P(e \mid \neg H) \cdot P(\neg H)$$
Core relationship:
$$\boxed{\text{posterior} \propto \text{prior} \times \text{likelihood}}$$
8. Naive Bayes Classifier
For classification with class $C$ and feature vector $\mathbf{x} = (x_1, x_2, \ldots, x_n)$:
Full Bayes:
$$P(C = c \mid \mathbf{x}) = \frac{P(C = c) \cdot P(\mathbf{x} \mid C = c)}{P(\mathbf{x})}$$
Naive assumption(朴素假设)— features are conditionally independent given the class:
$$P(x_1, x_2, \ldots, x_n \mid C) = \prod_{i=1}^{n} P(x_i \mid C)$$
This simplifies the classifier to:
$$P(C = c \mid \mathbf{x}) \propto P(C = c) \cdot \prod_{i=1}^{n} P(x_i \mid C = c)$$
Classification rule:
$$\hat{C} = \underset{c}{\arg\max} ; P(C = c) \cdot \prod_{i=1}^{n} P(x_i \mid C = c)$$
Log-score version (prevents numerical underflow from multiplying many small probabilities):
$$\hat{C} = \underset{c}{\arg\max} \left[ \log P(C = c) + \sum_{i=1}^{n} \log P(x_i \mid C = c) \right]$$
Why log-score? When you multiply many probabilities like $0.01 \times 0.005 \times 0.001 \times \ldots$, the product quickly becomes too small for floating-point representation. In log-space, multiplication becomes addition, which is numerically stable. Each feature contributes additively in log-space.
Why Naive Bayes works despite the unrealistic independence assumption:
- Only ranking matters — for classification, we only need $P(C_1 \mid \mathbf{x}) > P(C_2 \mid \mathbf{x})$, not exact values. Even if individual probabilities are wrong, the relative ordering is often preserved.
- Many weak signals combine effectively — errors from different features tend to cancel out.
- Easy parameter estimation — only need to estimate $P(x_i \mid C)$ for each feature individually, not the full joint $P(x_1, x_2, \ldots \mid C)$. Works with limited training data.
- Avoids overfitting in high-dimensional spaces — more complex models that model feature dependencies may overfit when data is scarce.
🔄 机制与推导(How It Works)
Procedure 1: Classifying Vagueness vs Uncertainty (Exam Algorithm)
INPUT: A scenario description.
OUTPUT: "Vagueness" or "Uncertainty" with justification.
Step 1: Ask — "Is there a DEFINITE FACT about the world that we simply don't know?"
→ YES → This is UNCERTAINTY → Bayesian reasoning
→ NO → Go to Step 2
Step 2: Ask — "Does the concept used have BLURRY BOUNDARIES / admit degrees?"
→ YES → This is VAGUENESS → Fuzzy logic
→ NO → This is standard logic (hard computing)
Worked examples from all exam years:
| # | Scenario | Step 1: Hidden fact? | Step 2: Blurry concept? | Answer | Tool |
|---|---|---|---|---|---|
| 1 | “This patient is high risk” | No — “high risk” is not a fact to discover | Yes — no sharp cutoff | Vagueness | Fuzzy Logic |
| 2 | Alarm went off — is it burglary? | Yes — burglar either came or didn’t | N/A | Uncertainty | Bayesian Reasoning |
| 3 | “Student 74 is almost excellent” | No — grade 74 is known precisely | Yes — “almost excellent” is graded | Vagueness | Fuzzy Logic |
| 4 | Spam filter with incomplete evidence | Yes — email is either spam or not | N/A | Uncertainty | Naive Bayes |
Pattern: If the scenario contains a linguistic/graded adjective (“high risk”, “almost excellent”, “warm”, “fast”), it is almost always vagueness. If it contains a binary outcome that we need to infer (“Is it X?”, “Did Y happen?”), it is almost always uncertainty.
Procedure 2: Fuzzy Rule Evaluation — Step by Step
Scenario: A fuzzy controller for air conditioning.
- $\mu_{\text{hot}}(\text{temp}) = 0.8$ (temperature is “hot” to degree 0.8)
- $\mu_{\text{humid}}(\text{humidity}) = 0.7$ (humidity is “humid” to degree 0.7)
Rule: IF temperature is hot AND humidity is humid THEN fan speed is high.
Step 1 — Fuzzification (already done — inputs mapped to membership degrees):
- $\mu_{\text{hot}} = 0.8$, $\mu_{\text{humid}} = 0.7$
Step 2 — Evaluate antecedent (fuzzy AND):
$$\text{Rule strength} = \min(\mu_{\text{hot}}, \mu_{\text{humid}}) = \min(0.8, 0.7) = 0.7$$
Step 3 — Apply to consequent:
The rule fires with strength 0.7. The output fuzzy set “high fan speed” is clipped (truncated) at 0.7.
Step 4 — Defuzzification (if multiple rules):
Combine all fired rules’ output fuzzy sets and compute a single crisp output, e.g., using centre of gravity (centroid) method:
$$\text{Crisp output} = \frac{\int \mu_{\text{output}}(z) \cdot z , dz}{\int \mu_{\text{output}}(z) , dz}$$
Procedure 3: Traditional Logic vs Fuzzy Logic — Hammer Thrower Example (2025 Q5)
Scenario: Evaluating whether an athlete is suited to be a hammer thrower using the rule:
IF STRONG AND HEAVY THEN HAMMER_THROWER
Traditional (crisp) logic approach:
- Set crisp thresholds: e.g., STRONG = bench press > 100kg, HEAVY = weight > 90kg
- Evaluate: If athlete benches 105kg and weighs 95kg → STRONG = True, HEAVY = True
- AND = True AND True = True
- Result: HAMMER_THROWER = True (binary yes/no)
- Problem: An athlete who benches 99kg and weighs 89kg gets HAMMER_THROWER = False, even though they are very close to the thresholds. Sharp cutoff is unrealistic.
Fuzzy logic approach:
- Define membership functions:
- $\mu_{\text{Strong}}$: maps bench press to degree in [0, 1]
- $\mu_{\text{Heavy}}$: maps weight to degree in [0, 1]
- Compute membership degrees: e.g., $\mu_{\text{Strong}}(\text{bench} = 95\text{kg}) = 0.7$, $\mu_{\text{Heavy}}(\text{weight} = 88\text{kg}) = 0.6$
- Fuzzy AND: $\min(0.7, 0.6) = 0.6$
- Result: HAMMER_THROWER suitability = 0.6 (a graded score, not binary)
- Advantage: No sharp cutoff. Athletes near the boundary get intermediate scores. The system degrades gracefully.
Key contrasts for the exam answer:
| Aspect | Traditional Logic | Fuzzy Logic |
|---|---|---|
| STRONG and HEAVY | Binary: True or False (by threshold) | Graded: degree in [0, 1] via membership function |
| AND operation | Boolean AND (both must be True) | $\min(\mu_{\text{Strong}}, \mu_{\text{Heavy}})$ |
| Output | Binary: is/isn’t a hammer thrower | Suitability score in [0, 1] |
| Boundary cases | Sharp cutoff — small difference → opposite conclusion | Smooth transition — similar inputs → similar outputs |
| Realism | Unrealistic for human attributes | More realistic — strength and heaviness are graded concepts |
Procedure 4: Bayesian Reasoning — Burglar Alarm (Lecture Example)
Setup:
- $P(\text{burglary}) = 0.0001$ (1 in 10,000)
- $P(\text{alarm} \mid \text{burglary}) = 0.95$ (alarm detects 95% of burglaries)
- $P(\text{alarm} \mid \neg\text{burglary}) = 0.01$ (1% false alarm rate)
Question: Alarm goes off. What is $P(\text{burglary} \mid \text{alarm})$?
Step 1 — Compute $P(\text{alarm})$ via law of total probability:
$$P(\text{alarm}) = P(\text{alarm} \mid \text{burglary}) \cdot P(\text{burglary}) + P(\text{alarm} \mid \neg\text{burglary}) \cdot P(\neg\text{burglary})$$
$$= 0.95 \times 0.0001 + 0.01 \times 0.9999$$
$$= 0.000095 + 0.009999 = 0.010094$$
Step 2 — Apply Bayes’ theorem:
$$P(\text{burglary} \mid \text{alarm}) = \frac{P(\text{alarm} \mid \text{burglary}) \cdot P(\text{burglary})}{P(\text{alarm})}$$
$$= \frac{0.95 \times 0.0001}{0.010094} = \frac{0.000095}{0.010094} \approx 0.0094$$
Step 3 — Interpret:
Still less than 1%! The alarm increased belief from 0.01% to ~0.94% — roughly a 100-fold increase — but the base rate (prior) is so low that even strong evidence doesn’t make burglary likely.
Why? Out of every 10,000 households:
- ~1 has a burglary, and the alarm goes off (true positive)
- ~100 have false alarms (0.01 × 9,999 ≈ 100)
- So out of ~101 alarm events, only 1 is a real burglary → about 1%
Key insight: The prior matters enormously. A highly sensitive test applied to a rare event will still produce mostly false positives. This is the base rate fallacy(基率谬误).
Procedure 5: Naive Bayes — Spam Detection Walkthrough
Setup:
- Classes: Spam ($S$) and Not-Spam ($\neg S$)
- $P(S) = 0.3$, $P(\neg S) = 0.7$
- Email contains words: “FREE” and “WINNER”
- $P(\text{“FREE”} \mid S) = 0.8$, $P(\text{“FREE”} \mid \neg S) = 0.05$
- $P(\text{“WINNER”} \mid S) = 0.6$, $P(\text{“WINNER”} \mid \neg S) = 0.02$
Step 1 — Compute unnormalized posteriors (using naive independence):
$$P(S \mid \text{email}) \propto P(S) \cdot P(\text{“FREE”} \mid S) \cdot P(\text{“WINNER”} \mid S)$$ $$= 0.3 \times 0.8 \times 0.6 = 0.144$$
$$P(\neg S \mid \text{email}) \propto P(\neg S) \cdot P(\text{“FREE”} \mid \neg S) \cdot P(\text{“WINNER”} \mid \neg S)$$ $$= 0.7 \times 0.05 \times 0.02 = 0.0007$$
Step 2 — Normalize:
$$P(S \mid \text{email}) = \frac{0.144}{0.144 + 0.0007} = \frac{0.144}{0.1447} \approx 0.995$$
Step 3 — Classify: 99.5% probability of spam. Classify as Spam.
Step 4 — Verify with log-score version:
$$\text{Score}(S) = \log(0.3) + \log(0.8) + \log(0.6) = -1.204 + (-0.223) + (-0.511) = -1.938$$
$$\text{Score}(\neg S) = \log(0.7) + \log(0.05) + \log(0.02) = -0.357 + (-2.996) + (-3.912) = -7.265$$
Since $-1.938 > -7.265$, classify as Spam. Same result, but numerically stable.
⚖️ 权衡分析(Trade-offs & Comparisons)
Fuzzy Logic vs Naive Bayes — The Master Comparison (from lecture slide 26)
| Dimension | Fuzzy Logic(模糊逻辑) | Naive Bayes(朴素贝叶斯) |
|---|---|---|
| Core idea | Degree of membership / partial truth | Probability of class given evidence |
| Handles | Vagueness(模糊性) | Uncertainty(不确定性) |
| Core question | “To what degree is this true?” | “How likely is this class?” |
| Values represent | Degree of membership (NOT probability) | Probability |
| Sum constraint | Degrees need NOT sum to 1 | Probabilities MUST sum to 1 |
| Input | Expert-defined rules, linguistic variables | Labelled training data, feature counts |
| Output | Control action, recommendation strength | Class label with posterior score |
| Knowledge source | Domain expert encodes rules | Learned from data |
| Key assumption | Rules correctly capture expert knowledge | Conditional independence of features |
| Best suited for | Smooth rule-based control (AC, ABS, washing machine) | Lightweight probabilistic classification (spam, text) |
| Handles continuous input | Naturally via membership functions | Requires discretization or Gaussian assumption |
| Interpretability | High — rules are human-readable | Moderate — probabilities are interpretable |
| Training | No training needed — rules from expert | Learns from labelled data |
When to Use Which
| Scenario | Best Approach | Why |
|---|---|---|
| Controlling room temperature | Fuzzy Logic | “Warm” / “cool” are vague; expert rules map naturally |
| Classifying emails as spam | Naive Bayes | Unknown class with probabilistic evidence from word features |
| Medical diagnosis from symptoms | Bayesian Reasoning | Unknown disease state; update belief with test results |
| Autopilot adjusting altitude | Fuzzy Logic | “Too high” / “descending fast” are graded, rule-based |
| Predicting customer churn | Naive Bayes | Binary outcome with multiple feature evidence |
| Washing machine cycle | Fuzzy Logic | “Somewhat dirty” → fuzzy rule → appropriate wash cycle |
| Sentiment analysis of reviews | Naive Bayes | Text classification with word-frequency features |
Strengths and Weaknesses
Fuzzy Logic:
- ✅ Interpretable — rules are human-readable
- ✅ No training data needed — expert knowledge suffices
- ✅ Smooth, gradual response — no sharp cutoffs
- ✅ Handles linguistic variables naturally
- ❌ Requires domain expert to define rules and membership functions
- ❌ Difficult to scale to high-dimensional problems
- ❌ Rules may be subjective — different experts give different rules
Naive Bayes:
- ✅ Simple, fast, scalable to large datasets
- ✅ Strong baseline for text classification
- ✅ Works well with limited training data
- ✅ Probabilistic output allows confidence-based decisions
- ❌ Independence assumption is often violated in practice
- ❌ Estimated probabilities can be poorly calibrated (too extreme)
- ❌ Cannot model feature interactions (e.g., “FREE” + “WINNER” together is more spammy than each alone)
🏗️ 设计题答题框架
Framework 1: Classifying Vagueness vs Uncertainty (Q6 pattern — 4 marks)
When given a scenario to classify, use this template for each sub-question (1 mark each):
Template (write this for each scenario):
“This is [vagueness / uncertainty] because [justification]. The appropriate tool is [Fuzzy Logic / Bayesian reasoning] because [link to tool].”
For vagueness:
“This is vagueness because the concept ‘[X]’ has no sharp boundary — it is a matter of degree, and different observers might draw the boundary in different places. Fuzzy Logic is the appropriate tool, as it models graded membership through $\mu(x) \in [0,1]$.”
For uncertainty:
“This is uncertainty because there is a definite state of the world (it either is [X] or it isn’t), but we lack sufficient evidence to determine which state is true. Bayesian reasoning is the appropriate tool, as it updates probability estimates over possible states using Bayes’ theorem.”
Framework 2: Contrasting Traditional vs Fuzzy Logic (Q5 2025 pattern — 3 marks)
Structure your answer in three parts:
Part A — Traditional logic approach (1 mark):
“In traditional logic, [concept] is evaluated with a crisp threshold (e.g., [value]). The attribute is either True or False. The AND operation is Boolean — both conditions must be True for the rule to fire. The output is binary.”
Part B — Fuzzy logic approach (1 mark):
“In fuzzy logic, [concept] is modelled with a membership function $\mu(x) \in [0,1]$. Each attribute has a graded degree. The AND operation uses $\min(\mu_A, \mu_B)$. The output is a continuous score representing the degree to which the conclusion holds.”
Part C — Why fuzzy is better for this case (1 mark):
“Fuzzy logic is more appropriate because [concept] is inherently a matter of degree — there is no natural sharp boundary. Fuzzy logic avoids the arbitrary threshold problem and provides a smooth, graded output that better reflects reality.”
Framework 3: Naive Bayes Assumptions (Q5 2024 pattern — 3 marks)
Two key assumptions:
-
Conditional independence (the “naive” part): Features $x_1, x_2, \ldots, x_n$ are independent of each other given the class label. Formally: $P(x_1, x_2, \ldots, x_n \mid C) = \prod_{i} P(x_i \mid C)$.
-
Feature relevance: All features contribute information about the class. (If a feature is completely irrelevant, it adds noise rather than signal.)
Why it works despite violations:
“Although the independence assumption is rarely true in practice, Naive Bayes still performs well because classification only requires the correct ranking of classes, not calibrated probabilities. Many weak, correlated signals can still combine effectively to produce correct class orderings.”
Framework 4: Designing a Soft Computing System (General)
WHAT: State the problem and why hard computing is insufficient.
“The problem requires handling [vagueness / uncertainty / both], which classical binary logic cannot capture.”
WHY: Justify the choice of approach.
“I choose [Fuzzy Logic / Bayesian / Hybrid] because [inputs are linguistically vague / we need probabilistic inference / both aspects are present].”
HOW: Describe the architecture.
- For Fuzzy: Define membership functions → Write fuzzy rules → Evaluate rules (min/max) → Defuzzify output
- For Bayesian: Define prior probabilities → Specify likelihoods → Apply Bayes’ theorem → Output posterior
- For Naive Bayes: Collect labelled training data → Estimate priors and likelihoods → Classify via argmax of posterior
TRADE-OFF: Acknowledge limitations.
“One limitation is [fuzzy rules require expert knowledge / Naive Bayes assumes independence]. This can be mitigated by [learning rules from data / using full Bayesian networks that model dependencies].”
EXAMPLE: Give a concrete computation.
“For example, with input temperature = 28°C, $\mu_{\text{warm}} = 0.7$, applying the rule ‘IF warm THEN medium fan’ gives output strength 0.7.”
📝 历年真题与标准答案(Past Exam Questions — Full Model Answers)
Q5 — S1 2025 Actual Exam [3 marks]
Contrast traditional logic vs fuzzy logic for the rule: IF STRONG AND HEAVY THEN HAMMER_THROWER. Give a concrete example of how each approach would evaluate an athlete.
Click to reveal model answer
Traditional logic approach:
In traditional (crisp) logic, STRONG and HEAVY are defined by sharp thresholds — for example, STRONG = (bench press > 100kg) and HEAVY = (weight > 90kg). For an athlete who benches 95kg and weighs 88kg, both conditions evaluate to False, so:
$$\text{STRONG} \wedge \text{HEAVY} = \text{False} \wedge \text{False} = \text{False}$$
Result: HAMMER_THROWER = False (not recommended at all).
For an athlete who benches 105kg and weighs 95kg, both are True, so HAMMER_THROWER = True.
The problem: an athlete at 99kg bench press gets a completely different result from one at 101kg, despite being almost identical.
Fuzzy logic approach:
In fuzzy logic, STRONG and HEAVY are modelled with membership functions $\mu_{\text{Strong}}$ and $\mu_{\text{Heavy}}$, each mapping to $[0, 1]$. For the same athlete (bench 95kg, weight 88kg):
$$\mu_{\text{Strong}}(95\text{kg}) = 0.7, \quad \mu_{\text{Heavy}}(88\text{kg}) = 0.6$$
$$\text{Fuzzy AND} = \min(0.7, 0.6) = 0.6$$
Result: HAMMER_THROWER suitability = 0.6 — a graded recommendation rather than a binary yes/no.
Key contrast: Traditional logic produces a binary classification with sharp, arbitrary cutoffs. Fuzzy logic produces a graded suitability score that transitions smoothly, better reflecting that strength and heaviness are inherently graded concepts with no natural sharp boundary.
Q6 — S1 2026 Sample Test [4 marks]
For each of the following, classify as vagueness or uncertainty and briefly justify:
- “This patient is high risk.”
- An alarm went off — was it a burglar?
- “Student 74 is almost excellent.”
- Email spam filter with incomplete evidence.
Click to reveal model answer
-
Vagueness — “High risk” is a concept with no sharp boundary. At what exact point does a patient become “high risk”? 50% risk? 60%? Different clinicians might disagree. The concept itself admits of degrees. The appropriate tool is Fuzzy Logic, which models the degree to which a patient is “high risk” via a membership function $\mu_{\text{high_risk}} \in [0, 1]$.
-
Uncertainty — Either a burglary occurred or it did not — there is a definite fact about the world. We have evidence (the alarm) but do not know the true state with certainty. The appropriate tool is Bayesian reasoning, which computes $P(\text{burglary} \mid \text{alarm})$ using Bayes’ theorem.
-
Vagueness — The grade 74 is known precisely; there is no hidden fact. The concept “almost excellent” is a graded, linguistic term with blurry boundaries — where exactly does “almost excellent” begin? 70? 72? 75? The appropriate tool is Fuzzy Logic, with a membership function for “almost excellent” (e.g., $\mu_{\text{almost_excellent}}(74) = 0.7$).
-
Uncertainty — The email is either spam or not spam (a definite class). We have incomplete evidence (word frequencies, sender info) and need to infer which class is true. The appropriate tool is Naive Bayes, which computes $P(\text{spam} \mid \text{features})$ via Bayes’ theorem with conditional independence assumption.
Q5 — S1 2024 Final [~3 marks]
State the key assumptions of the Naive Bayes classifier and explain why it works well in practice despite these assumptions being violated.
Click to reveal model answer
Key assumptions:
-
Conditional independence: Given the class label $C$, all features $x_1, x_2, \ldots, x_n$ are independent of each other. Formally: $$P(x_1, x_2, \ldots, x_n \mid C) = \prod_{i=1}^{n} P(x_i \mid C)$$ This means knowing the value of one feature provides no information about any other feature, once we know the class. In practice, this is almost always violated — for example, in spam detection, the words “FREE” and “WINNER” are correlated (they tend to co-occur in spam).
-
Feature relevance: All features are assumed to carry some discriminative information about the class. Irrelevant features can degrade performance by adding noise.
Why it works despite violations:
- Only ranking matters: For classification, we only need $P(C_1 \mid \mathbf{x}) > P(C_2 \mid \mathbf{x})$, not exact probability values. Even when individual probability estimates are biased due to violated independence, the relative ordering of classes is often preserved.
- Many weak signals combine effectively: In high-dimensional problems like text classification, each word provides a small signal. The product of many such signals (even if correlated) still tends to point toward the correct class.
- Easy parameter estimation: We only need to estimate univariate distributions $P(x_i \mid C)$, not the full joint distribution. This requires far less training data and avoids overfitting in high-dimensional feature spaces.
- Errors cancel out: Positive and negative correlations among features tend to partially cancel each other, making the overall prediction more robust than the individual estimates might suggest.
📝 Additional Practice Questions
Practice Q1: New Vagueness vs Uncertainty Scenarios [4 marks]
Classify each scenario as vagueness or uncertainty. Justify your answer and name the appropriate reasoning tool.
(a) “This coffee is too hot to drink.” (b) A pregnancy test shows positive — is the person actually pregnant? (c) “The traffic is heavy on the motorway.” (d) Based on satellite imagery, did deforestation occur in this region last year?
Click to reveal answers
(a) Vagueness — “Too hot” has no sharp boundary. At what exact temperature does coffee become “too hot”? 60°C? 65°C? 70°C? The concept admits degrees. Tool: Fuzzy Logic ($\mu_{\text{too_hot}}(65°\text{C}) = 0.6$).
(b) Uncertainty — The person either is pregnant or is not — a definite biological fact. The test provides probabilistic evidence, but we don’t know the true state with certainty. Tool: Bayesian Reasoning (update prior with test sensitivity/specificity).
(c) Vagueness — “Heavy traffic” is a graded concept. Is 50 cars/minute heavy? 80? 120? There is no universally agreed crisp boundary. Tool: Fuzzy Logic.
(d) Uncertainty — Either deforestation occurred or it didn’t — a definite historical fact. We have incomplete evidence (satellite images may be cloudy or ambiguous). Tool: Bayesian Reasoning (probability of deforestation given observed image features).
Practice Q2: Fuzzy Logic Computation [3 marks]
Given:
- $\mu_A = 0.6$ (degree to which temperature is “warm”)
- $\mu_B = 0.9$ (degree to which humidity is “high”)
Compute: (a) $A \wedge B$ (Fuzzy AND) (b) $A \vee B$ (Fuzzy OR) (c) $\neg A$ (Fuzzy NOT) (d) $A \rightarrow B$ using Godel implication (e) $A \rightarrow B$ using standard implication
Click to reveal answers
(a) $A \wedge B = \min(0.6, 0.9) = 0.6$
(b) $A \vee B = \max(0.6, 0.9) = 0.9$
(c) $\neg A = 1 - 0.6 = 0.4$
(d) Godel: Since $A = 0.6 \leq B = 0.9$, we get $A \rightarrow B = 1$. (If the antecedent holds to degree 0.6 and the consequent holds to degree 0.9, the implication is fully satisfied — the consequent “more than covers” the antecedent.)
(e) Standard: $A \rightarrow B = \max(1 - 0.6, 0.9) = \max(0.4, 0.9) = 0.9$
Practice Q3: Fuzzy Implication Edge Case [2 marks]
Given $\mu_A = 0.7$ and $\mu_B = 0.3$:
(a) Compute $A \rightarrow B$ using Godel implication. (b) Compute $A \rightarrow B$ using standard implication. (c) Now compute for $A = 0.5, B = 0$ using both. Which is more intuitive?
Click to reveal answers
(a) Godel: Since $A = 0.7 > B = 0.3$, return $B = 0.3$.
(b) Standard: $\max(1 - 0.7, 0.3) = \max(0.3, 0.3) = 0.3$. (Same result here.)
(c) For $A = 0.5, B = 0$:
- Standard: $\max(1 - 0.5, 0) = \max(0.5, 0) = 0.5$
- Godel: Since $0.5 > 0$, return $B = 0$
The Godel version is more intuitive here. If the antecedent holds to degree 0.5 but the consequent is completely false (0), it makes sense that the implication should fail entirely (= 0). The standard version giving 0.5 is counterintuitive — it suggests the implication is “half true” even though the consequent is completely false.
Practice Q4: Bayesian Reasoning Calculation [4 marks]
A medical test for a rare disease:
- $P(\text{disease}) = 0.002$ (prevalence: 2 in 1,000)
- $P(\text{positive} \mid \text{disease}) = 0.98$ (sensitivity)
- $P(\text{positive} \mid \neg\text{disease}) = 0.03$ (false positive rate)
(a) Compute $P(\text{positive})$. (b) Compute $P(\text{disease} \mid \text{positive})$. (c) Interpret the result.
Click to reveal answers
(a)
$$P(\text{positive}) = P(\text{pos} \mid \text{disease}) \cdot P(\text{disease}) + P(\text{pos} \mid \neg\text{disease}) \cdot P(\neg\text{disease})$$
$$= 0.98 \times 0.002 + 0.03 \times 0.998$$
$$= 0.00196 + 0.02994 = 0.0319$$
(b)
$$P(\text{disease} \mid \text{positive}) = \frac{P(\text{pos} \mid \text{disease}) \cdot P(\text{disease})}{P(\text{positive})} = \frac{0.98 \times 0.002}{0.0319} = \frac{0.00196}{0.0319} \approx 0.0614$$
(c) Only about 6.1% chance of having the disease despite a positive test. The test increased belief from 0.2% to 6.1% (a ~30x increase), but because the disease is rare (low prior), most positive tests are still false positives. The patient should get a confirmatory second test rather than panicking. This illustrates the base rate fallacy — a sensitive test on a rare condition still produces many false positives because the low base rate dominates.
Practice Q5: Naive Bayes Classification [3 marks]
You are building a fruit classifier. Given:
Feature P(feature | Apple) P(feature | Orange) Red 0.7 0.1 Round 0.8 0.9 Smooth skin 0.3 0.8 $P(\text{Apple}) = 0.5$, $P(\text{Orange}) = 0.5$
A fruit is Red, Round, and has Smooth skin. Classify it.
Click to reveal answers
Apple score:
$$P(\text{Apple}) \times P(\text{Red} \mid \text{Apple}) \times P(\text{Round} \mid \text{Apple}) \times P(\text{Smooth} \mid \text{Apple})$$
$$= 0.5 \times 0.7 \times 0.8 \times 0.3 = 0.084$$
Orange score:
$$P(\text{Orange}) \times P(\text{Red} \mid \text{Orange}) \times P(\text{Round} \mid \text{Orange}) \times P(\text{Smooth} \mid \text{Orange})$$
$$= 0.5 \times 0.1 \times 0.9 \times 0.8 = 0.036$$
Comparison: $0.084 > 0.036$, so classify as Apple.
Normalized posterior: $P(\text{Apple} \mid \text{features}) = 0.084 / (0.084 + 0.036) = 0.084 / 0.120 = 0.70 = 70%$
The “Red” feature strongly favours Apple ($0.7$ vs $0.1$), which outweighs the “Smooth skin” evidence favouring Orange ($0.3$ vs $0.8$). This shows how Naive Bayes weighs each feature’s contribution independently.
Practice Q6: Naive Bayes with Log-Score [3 marks]
Using the same fruit example above, compute the log-scores and verify the classification.
Click to reveal answers
Apple log-score:
$$\log(0.5) + \log(0.7) + \log(0.8) + \log(0.3)$$ $$= -0.693 + (-0.357) + (-0.223) + (-1.204) = -2.477$$
Orange log-score:
$$\log(0.5) + \log(0.1) + \log(0.9) + \log(0.8)$$ $$= -0.693 + (-2.303) + (-0.105) + (-0.223) = -3.324$$
Since $-2.477 > -3.324$, classify as Apple. Same result as the product version, but using addition in log-space avoids the risk of numerical underflow when there are many features.
Note how each feature’s contribution is additive in log-space:
- Prior: same ($-0.693$)
- Red: Apple gets $-0.357$ vs Orange gets $-2.303$ → Red strongly favours Apple (difference of $+1.946$)
- Round: Apple gets $-0.223$ vs Orange gets $-0.105$ → Round slightly favours Orange
- Smooth: Apple gets $-1.204$ vs Orange gets $-0.223$ → Smooth favours Orange (difference of $-0.981$)
Net effect: Red’s contribution ($+1.946$) outweighs Smooth’s ($-0.981$), so Apple wins.
Practice Q7: Fuzzy Control System [3 marks]
A fuzzy controller for a car’s ABS (Anti-lock Braking System) uses two inputs:
- Speed: $\mu_{\text{fast}}(v) = 0.9$
- Road condition: $\mu_{\text{slippery}}(\text{road}) = 0.5$
Rule 1: IF speed is fast AND road is slippery THEN brake pressure is low. Rule 2: IF speed is fast AND road is NOT slippery THEN brake pressure is high.
(a) Compute the firing strength of Rule 1. (b) Compute the firing strength of Rule 2. (c) Which rule fires more strongly? What does this mean for the braking?
Click to reveal answers
(a) Rule 1: $\min(\mu_{\text{fast}}, \mu_{\text{slippery}}) = \min(0.9, 0.5) = 0.5$
(b) Rule 2: $\min(\mu_{\text{fast}}, \neg\mu_{\text{slippery}}) = \min(0.9, 1 - 0.5) = \min(0.9, 0.5) = 0.5$
(c) Both rules fire with equal strength (0.5). This makes sense — the road is exactly at the boundary between slippery and not slippery ($\mu = 0.5$). The defuzzification step would combine both rules’ outputs, producing a moderate brake pressure — a compromise between “low” and “high.” This is precisely the advantage of fuzzy control: instead of an abrupt switch between strategies, it produces a smooth blend.
Practice Q8: Conceptual Short Answer [2 marks each]
(a) A fuzzy set assigns $\mu_{\text{Tall}}(175\text{cm}) = 0.4$. A student says: “This means there is a 40% probability the person is tall.” Is this correct? Explain.
(b) Why does Naive Bayes work well in practice despite its unrealistic independence assumption?
(c) In the burglar alarm example, $P(\text{burglary} \mid \text{alarm}) \approx 0.94%$. Why so low despite 95% alarm reliability?
(d) What is the difference between $P(e \mid H)$ and $P(H \mid e)$? Why do people often confuse them?
Click to reveal answers
(a) Incorrect. The value 0.4 is a degree of membership, not a probability. There is no randomness — the person is definitely 175cm. The 0.4 expresses how much the vague concept “Tall” applies to this height. Fuzzy membership handles vagueness (blurry concepts); probability handles uncertainty (unknown states). They are fundamentally different: $\mu_{\text{Tall}} + \mu_{\text{Short}}$ does NOT need to equal 1, but $P(\text{tall}) + P(\neg\text{tall})$ MUST equal 1.
(b) For classification, we only need the correct ranking of classes, not exact posterior probabilities. Even when features are correlated (violating independence), the class with the highest true posterior typically still receives the highest Naive Bayes score. Additionally, in high-dimensional data (like text), more complex models that model feature dependencies may overfit, while Naive Bayes remains stable due to its simplicity.
(c) Because the prior probability of burglary is extremely low ($P = 0.0001$). Although the alarm is 95% sensitive, false alarms ($P(\text{alarm} \mid \neg\text{burglary}) = 0.01$) applied to the ~10,000 non-burglary events produce ~100 false alarms. So out of ~101 total alarms, only ~1 is a true burglary. This is the base rate fallacy — ignoring how rare the event is leads to overestimating the posterior.
(d) $P(e \mid H)$ is the likelihood — “If the hypothesis is true, how likely is the evidence?” $P(H \mid e)$ is the posterior — “Given the evidence, how likely is the hypothesis?” People confuse them because intuitively, “the alarm is 95% reliable” ($P(\text{alarm} \mid \text{burglary}) = 0.95$) feels like it should mean “if the alarm goes off, there’s a 95% chance of burglary” ($P(\text{burglary} \mid \text{alarm}) = 0.95$). But these are NOT the same — the posterior also depends on the prior. This confusion is called the prosecutor’s fallacy or confusion of the inverse.
Practice Q9: Contrast Traditional vs Fuzzy for a New Scenario [3 marks]
Compare how traditional logic and fuzzy logic would evaluate the rule: IF EXPERIENCED AND CREATIVE THEN GOOD_DESIGNER
Use a concrete example of a candidate.
Click to reveal answers
Traditional logic:
Set crisp thresholds — e.g., EXPERIENCED = (years > 5), CREATIVE = (portfolio score > 80/100).
For a candidate with 4 years experience and portfolio score 78:
- EXPERIENCED = False (4 < 5)
- CREATIVE = False (78 < 80)
- GOOD_DESIGNER = False AND False = False
This candidate is rejected entirely, despite being very close to both thresholds.
Fuzzy logic:
Define membership functions for EXPERIENCED and CREATIVE, each mapping to [0, 1].
For the same candidate:
- $\mu_{\text{Experienced}}(4 \text{ years}) = 0.7$ (fairly experienced)
- $\mu_{\text{Creative}}(78) = 0.8$ (quite creative)
- Fuzzy AND: $\min(0.7, 0.8) = 0.7$
- GOOD_DESIGNER suitability = 0.7 (a strong recommendation)
Key contrast: Traditional logic gives a binary rejection despite the candidate being close to thresholds. Fuzzy logic gives a graded score (0.7) reflecting that this candidate is a fairly good designer. For concepts like “experienced” and “creative” that inherently admit degrees, fuzzy logic provides a more realistic and nuanced evaluation.
Practice Q10: Quick Quiz (from Lecture) [3 marks]
(a) What does fuzzy logic primarily model? A. Uncertainty in data B. Probability of events C. Vagueness in concepts D. Statistical correlation
(b) If fuzzy membership $\mu_A = 0.6$ and $\mu_B = 0.8$, what is $A \wedge B$? A. 0.6 B. 0.8 C. 0.7 D. 1.0
(c) What is the key assumption of Naive Bayes? A. Features are independent of the class B. Features are conditionally independent given the class C. All features have equal weight D. The prior is uniform
Click to reveal answers
(a) C — Fuzzy logic models vagueness in concepts (blurry boundaries), not uncertainty (which is handled by Bayesian reasoning).
(b) A — Fuzzy AND = $\min(0.6, 0.8) = 0.6$. The conjunction is limited by the weaker component.
(c) B — Features are conditionally independent given the class. Note: NOT “independent of the class” (that would mean features carry no information). The “naive” assumption is that features are independent of each other once we know the class.
🌐 英语表达要点(English Expression)
Describing Vagueness vs Uncertainty
"This is an example of vagueness because the concept '[X]' admits of degrees
and has no sharp boundary — it is not a yes/no matter."
"This is an example of uncertainty because there is a definite state of the
world, but we lack sufficient evidence to determine which state is true."
"Vagueness concerns the definition of a concept; uncertainty concerns
our knowledge of a fact."
Describing Fuzzy Logic
"Fuzzy Logic models graded concepts through membership functions
μ(x) ∈ [0, 1], where 0 means complete non-membership and 1 means
full membership."
"The fuzzy AND of two values is computed as their minimum:
min(μ_A, μ_B). This captures the idea that a conjunction is only
as strong as its weakest component."
"A membership value of 0.7 indicates that the element belongs to
the fuzzy set to degree 0.7 — this is NOT a probability."
"Fuzzy logic is particularly suited to control systems because
concepts like 'warm', 'fast', and 'heavy' are inherently graded."
Describing Bayesian Reasoning
"By Bayes' theorem, the posterior probability P(H|e) is proportional
to the prior P(H) multiplied by the likelihood P(e|H)."
"The prior represents our initial belief before observing evidence,
while the posterior represents our updated belief after evidence."
"The base rate fallacy occurs when we ignore the prior probability
and overweight the evidence, leading to incorrect conclusions."
Describing Naive Bayes
"Naive Bayes assumes conditional independence of features given the
class, which simplifies the joint likelihood to a product of
individual feature likelihoods."
"Despite its 'naive' assumption, Naive Bayes works well in practice
because classification only requires correct ranking of classes,
not calibrated probability estimates."
"The log-score version converts multiplication to addition,
preventing numerical underflow when many features are involved."
Contrasting Traditional vs Fuzzy Logic (for Q5-type questions)
"In traditional logic, [attribute] is evaluated against a crisp
threshold, producing a binary True/False result."
"In fuzzy logic, [attribute] is modelled with a membership function
that maps to a continuous degree in [0, 1]."
"The key advantage of fuzzy logic is that it avoids arbitrary
threshold effects and produces smooth, graded outputs."
"Fuzzy logic is more appropriate here because [concept] is
inherently a matter of degree with no natural sharp boundary."
易错表达 / Common Expression Mistakes
| Incorrect Expression | Correct Expression | Why |
|---|---|---|
| “Fuzzy Logic handles uncertainty” | “Fuzzy Logic handles vagueness” | Uncertainty → Bayes; Vagueness → Fuzzy |
| “μ = 0.6 means 60% probability” | “μ = 0.6 means degree of membership 0.6” | Membership ≠ probability |
| “Soft computing is imprecise, so it’s worse” | “Soft computing tolerates imprecision to solve harder problems” | Tolerance of imprecision is a strength |
| “Naive Bayes requires independent features” | “Naive Bayes assumes conditional independence” | The assumption may be violated but NB still works |
| “The posterior is the prior times the likelihood” | “The posterior is proportional to prior times likelihood” | Must normalise by $P(e)$ for exact values |
| “P(e|H) = P(H|e)” | “P(e|H) is the likelihood; P(H|e) is the posterior — they are different” | Confusion of the inverse / prosecutor’s fallacy |
| “Fuzzy degrees must sum to 1” | “Fuzzy membership degrees do NOT need to sum to 1” | Only probabilities must sum to 1 |
高频考试用词
- admits of degrees — 承认程度差异(describes vagueness)
- base rate — 基率(prior probability of a rare event)
- base rate fallacy — 基率谬误(ignoring the prior when interpreting evidence)
- conditionally independent — 条件独立(the “naive” assumption in Naive Bayes)
- crisp boundary — 清晰边界(classical sets have it; fuzzy sets don’t)
- degrades gracefully — 优雅降级(soft computing’s advantage over hard computing)
- defuzzification — 去模糊化(converting fuzzy output to a crisp value)
- degree of membership — 隶属度(NOT probability)
- false positive rate — 假阳性率 ($P(\text{positive} \mid \neg\text{disease})$)
- firing strength — 规则触发强度(the result of evaluating a fuzzy rule’s antecedent)
- linguistic variable — 语言变量(e.g., “temperature” with values “cold”, “warm”, “hot”)
- likelihood — 似然 ($P(e \mid H)$, not to be confused with posterior)
- posterior — 后验概率 ($P(H \mid e)$)
- prior — 先验概率 ($P(H)$)
- sensitivity — 灵敏度 ($P(\text{positive} \mid \text{disease})$)
✅ 自测检查清单
Concepts — Vagueness vs Uncertainty
- Can I define vagueness and uncertainty in one sentence each in English?
- Can I correctly classify 4+ new scenarios as vagueness or uncertainty?
- Can I explain why “soft” does not mean “weak”?
- Do I know the two-step decision procedure for classifying vagueness vs uncertainty?
Fuzzy Logic
- Can I compute fuzzy AND ($\min$), OR ($\max$), and NOT ($1 - \mu$)?
- Can I compute both standard and Godel fuzzy implication?
- Can I explain why $\mu = 0.6$ is NOT a probability?
- Can I explain why fuzzy degrees do NOT need to sum to 1?
- Can I describe the fuzzy control pipeline (fuzzification → inference → defuzzification)?
- Can I contrast traditional logic vs fuzzy logic for a given rule (like hammer thrower)?
- Can I name 3+ real-world fuzzy logic applications?
Bayesian Reasoning
- Can I write Bayes’ theorem from memory and explain each term?
- Can I expand $P(e)$ using the law of total probability?
- Can I work through the burglar alarm example step by step?
- Can I explain the base rate fallacy in my own words?
- Do I know the difference between $P(e \mid H)$ and $P(H \mid e)$?
Naive Bayes
- Can I state the conditional independence assumption precisely?
- Can I compute a Naive Bayes classification by hand (multiply priors and likelihoods)?
- Can I normalize to get actual posterior probabilities?
- Can I write the log-score version and explain why it prevents underflow?
- Can I explain why Naive Bayes works despite unrealistic assumptions (3 reasons)?
Exam Readiness
- Can I answer a Q6-style question (4 scenarios, vagueness vs uncertainty) in under 5 minutes?
- Can I write a full contrast answer for traditional vs fuzzy logic in under 8 minutes?
- Can I state and justify Naive Bayes assumptions in a short answer?
- Do I know the Fuzzy Logic vs Naive Bayes comparison table from memory?
- Can I do a full Bayes’ theorem calculation without referring to notes?