The initial example they give is about pulling different flavoured candy out of a sack (remember the balls and urn from your basic stats?). They also provide a really good discussion showing how standard least-squares regression is a special case of maximum-likelihood for when the data are generated by a process with Gaussian noise of fixed variance.
Their first example is for estimating parameters in a discrete distribution of candy, but we can apply the same math to estimating the variance of a continuous distribution. Estimating variance is important, lots of times in industrial or business settings the variance of a thing matters as much or more than the average, just check-out all of the press those Six Sigma guys get. That's because it gives us insight into our risk. It helps us answer questions like, "What's our probability of success?" And maybe, if we're lucky, "What things is that probability sensitive to?"
Bayes theorem is a very simple equation:
Where P(h) is the prior probability of the hypothesis, P(d|h) is the likelihood of the data given the hypothesis, and P(h|d) is the posterior probability of the hypothesis given the data.
Octave has plenty of useful built-in functions that make it easy to play around with some Bayesian estimation. We'll set up a prior distribution for what we believe our variance to be with
chi2pdf(x,4), which gives us a Chi-squared distribution with 4 degrees of freedom. We can draw a random sample from a normal distribution with the
normrnd()function, and we'll use 5 as our "true" variance. That way we can see how our Bayesian and our standard frequentist estimates of the variance converge on the right answer. The standard estimate of variance is just
dis the data vector.
The likelihood part of Bayes theorem is:
% likelihood( d | M ) = PI_i likelihood(d_i, M_j)
lklhd(j) = prod( normpdf( d(1:i), 0, sqrt( x(j) ) ) );
lklhd = lklhd/trapz(x,lklhd); % normalize it
Then the posterior distribution is:
% posterior( M | d ) = prior( M ) * likelihood( d | M )
post_p = prior_var .* lklhd;
post_p = post_p/trapz(x,post_p); % normalize it
Both of the estimates of the variance converge on the true answer as
napproaches infinity. If you have a good prior, the Bayesian estimate is especially useful when
It's interesting to watch how the posterior distribution changes as we add more samples from the true distribution.
The great thing about Bayes theorem is that it provides a continuous bridge from what we think we know to reality. It allows us to build up evidence and describe our knowledge in a consistent way. It's based on the fundamentals of basic probability and was all set down in a few pages by a nonconformist Presbyterian minister and published after his death in 1763.
Octave file for the above calculations: simple_bayes.m