Chapter 6: Bayesian Learning

Bayesian Belief Networks

Allow stating conditional independence assumptions that apply to subsets of variables
Consider random variables Y₁ ... Y_n where each random variable takes on values V(Y_i)
The joint space of the set of variables Y is the cross product V(Y₁) x ... V(Y_n)
The probability distribution over this joint space is called the joint probability distribution
A BBN describes the joint probability distribution for a set of variables

X is conditionally independent of Y given Z if P(X | Y, Z) = P(X | Z)

Take a look at Figure 6.3
Each node represents a random variable
The network arcs represent the assertion that the variable is conditionally independent of its nondescendants, given its immediate predecessors
A conditional probability table is needed for each variable
P(y₁, ... y_n) = Π P(y_i | Parents(Y_i))

An active area of research
Is the network structure given or must it be learned?
Do training examples contain missing information?
If the network structure is provided and the training examples contain no missing data, the conditional probability tables can be estimated using the same technique used by a naive Bayes classifier

Construct a realistic BBN for the data listed in Table 3.2 on page 59.

Can be used when a subset of the relevant instance features are observable
Take a look at Figure 6.4
A training instances is generated by choosing one of the two normal curves (each has a 50% probability of being picked) and then x_i is randomly generated according to the selected distribution
Assume σ² is known and is the same for each distribution
The learning task is to output h = < μ₁, μ₂ > such that h is an ML hypothesis (in other words, it maximizes p(D|h))
We can think of a training example as being a triple < x_i, z_i1, z_i2 >
x_i is the observed variable
z_ij is a hidden variable that takes on the value 1 if x_i was created by distribution j and 0 otherwise

Generate a random initial hypothesis, h = < μ₁, μ₂ >
Calculate the expected value E[z_ij] of each hidden variable z_ij assuming that h holds.
E[z_ij] = [e^{(-1/2σ²)(x_i - μ_j)²]} / Σ [e^{(-1/2σ²)(x_i - μ_n)²]} /
Calculate a new ML hypothesis by assuming the value taken on by each hidden variable z_ij is its expected value E[z_ij] calculated in step 2. Set h to this new hypothesis. Go to step 2.
μ_j = (1/m) Σ E[z_ij] * x_i

Show what would happen on the next iteration if σ² = 1, μ₁ = 4.5, μ₂ = 6.5, x₁ = 5 and x₂ = 6.