Chapter 6: Bayesian Learning
Bayesian Belief Networks
- Allow stating conditional independence assumptions that apply
to subsets of variables
- Consider random variables Y1 ... Yn where each
random variable takes on values V(Yi)
- The joint space of the set of variables Y is the cross product
V(Y1) x ... V(Yn)
- The probability distribution over this joint space is called the
joint probability distribution
- A BBN describes the joint probability distribution for a set
of variables
Conditional Independence
X is conditionally independent of Y given Z if
P(X | Y, Z) = P(X | Z)
Representation
- Take a look at Figure 6.3
- Each node represents a random variable
- The network arcs represent the assertion that the
variable is conditionally independent of its nondescendants,
given its immediate predecessors
- A conditional probability table is needed for each variable
- P(y1, ... yn) = Π P(yi | Parents(Yi))
Inference
- Relatively simple if the values for all other random variables are known
- NP-Hard otherwise!
Learning BBNs
- An active area of research
- Is the network structure given or must it be learned?
- Do training examples contain missing information?
- If the network structure is provided and the training examples
contain no missing data, the conditional probability tables can
be estimated using the same technique used by a naive Bayes classifier
Exercise
Construct a realistic BBN for the data listed in
Table 3.2 on page 59.
The EM Algorithm - A Specific Example
- Can be used when a subset of the relevant instance
features are observable
- Take a look at Figure 6.4
- A training instances is generated by choosing one of the two
normal curves (each has a 50% probability of being picked) and
then xi is randomly generated according to the selected
distribution
- Assume σ2 is known and is the same for
each distribution
- The learning task is to output h = < μ1,
μ2 > such that h is an ML hypothesis
(in other words, it maximizes p(D|h))
- We can think of a training example as being a triple
< xi, zi1, zi2 >
- xi is the observed variable
- zij is a hidden variable that
takes on the value 1 if xi was created by distribution j
and 0 otherwise
Algorithm (p. 193)
- Generate a random initial hypothesis, h = < μ1,
μ2 >
- Calculate the expected value E[zij] of each hidden
variable zij assuming that h holds.
E[zij] =
[e(-1/2σ2)(xi - μj)2] /
Σ
[e(-1/2σ2)(xi - μn)2] /
- Calculate a new ML hypothesis by assuming the value taken on by each
hidden variable zij is its expected value E[zij]
calculated in step 2. Set h to this new hypothesis. Go to step 2.
μj = (1/m) Σ E[zij] * xi
Exercise
Show what would happen on the next iteration if
σ2 = 1,
μ1 = 4.5, μ2 = 6.5, x1 = 5
and x2 = 6.