Saturday, December 20, 2014

Gaussian Processes for Machine Learning

What a great resource for learning about Gaussian Processes: The Gaussian Processes Web Site.

Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics.

The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

1. Here's what's missing from the book, iterative methods to avoid N^3 scaling of direct inversion for the linear system solution:
- Improved Fast Gauss Transform code, user's manual, slides, slides
- Preconditioned Krylov Solvers for Kernel Regression

The strategies are to get an approximate solution using an iterative method, and also to approximate the matrix-vector multiply (N or NlogN instead of N^2), of course preconditioning is useful for any method relying on Krylov subspace approaches. The interesting thing about the matrix-vector multiply approximation is that it can be done with worse accuracy as the solution progresses, further saving wall-time.

Another acceleration approach is to fit on only a subset of the data or do a direct inversion on a reduced rank approximation of the matrix (this is covered in Chapter 8):
- A Unifying View of Sparse Approximate Gaussian Process Regression

The really cool thing is that all of these acceleration approaches can, in concept, be combined. I haven't found a demonstration that actually does combine them all, so if you know of someone who has published on that please share a link!

1. Well, I didn't read far enough in chapter 8, they actually do mention the improved Gauss transform method, and then far too quickly dismiss "iterative methods" from their subsequent comparisons.

I think there are plenty of times when an approximate solution to the whole problem is more useful than an exact solution to a partial problem.

2. The gpml software is very well documented. I like the way their user documentation (doc/index.html in the download) links directly to all the source scripts that they mention. Great way to introduce the reader to the code.