The Filipelli dataset  from the NIST Statistical Reference Datasets came up in discussions about ill-conditioned problems over on the Air Vent. Fitting the certified polynomial is a example problem 5.10 in Moler’s MATLAB book . Part of the problem is to compare five different methods of fitting the 10th-order polynomial
- MATLAB’s polyfit (which does QR on the Vandermonde matrix)
- MATLAB’s backslash operator (which will do QR with pivoting)
- Normal equations
- Centering by the mean and normalizing by the standard deviation
Carrick already demonstrated the centering approach in the comments. I was surprised how well the simple centering approach worked (it’s been quite a while since I’ve done this homework problem). Carrick mentioned that Gram-Schmidt might be another way to go, and I mentioned that QR using rotations or reflections is usually preferred for numerical stability reasons. One of the trade-offs is that Gram-Schmidt gives you the basis one vector at a time, but the other methods gives you Q all at once at the end. Depending on the problem one may be more useful than the other.
The LAPACK routine DGEQRG uses the Householder reflection method to calculate the QR decomposition . This routine is wrapped in scipy.linalg.qr(), and it has an option to return only economic factors and perform pivoting. The Scipy version of polyfit() uses the singular value decomposition (from which you get the pseudoinverse).
So, the two main ways of getting robust numerical algorithms for solving least-squares problems are various flavors of and the pseudoinverse. Naive application of some statistical software that doesn’t use these methods by default leads to problems (e.g. SAS and R). Since I don’t like doing other people’s homework I won’t got through Moler’s list, but I did want to demonstrate another method to tackle the problem using the dreaded normal equations (which are terribly conditioned in the case of the Vandermonde matrix).
The basic idea behind both numerical approaches is to build a “good” basis to describe , so that when we write in terms of this basis, and perform operations using the basis, we don’t loose lots of precision. We can take a more symbolic approach to the problem, and define an orthogonal basis analytically. So instead of using the monomials as in the Vandermonde matrix, we can use an orthogonal set like the Legendre polynomials. It’s easiest to use this basis if you map the data to the interval .
If you read that thread from the R mailing list, you’ll recognize this approach as the one that was recommended (use the R poly function to define the model matrix with an orthogonal basis rather than the monomials which gives the Vandermonde matrix).
Here’s the Python to do the fit on the raw data just using the built-in polyfit() funciton.
# polyfit uses pseudoinverse
nx = 3000
p = sp.polyfit(x,y,10)
px = sp.linspace(x.min(),x.max(),nx)
# map x to [-1,1]
x_map = (x.min() + x.max() - 2.0*x)/(x.min() - x.max())
px_map = sp.linspace(-1,1,nx)
py = sp.polyval(p, px)
pNISTy = sp.polyval(beta_NIST, px)
NISTy = sp.polyval(beta_NIST, x)
Here’s the Python to do the fit using some other matrix factorizations.
# compare polyfit to QR, this version uses Householder reflections
Q,R = qr(X, mode=’qr’, econ=True)
b = sp.dot(sp.transpose(Q),y)
beta = solve(R,b)
beta = sp.flipud(beta) # polyval expects the coefficients in the other order
qry = sp.polyval(beta, px)
Finally, here is the solution using the Legendre basis to start, which avoids the necessity of generating the basis numerically using QR or the singular value decomposition (scipy.special.legendre).
# compare to pseudoinverse
Xdag = pinv(X)
beta_pinv = sp.flipud(sp.dot(Xdag, y))
pinvy = sp.polyval(beta_pinv, px)
Xmapdag = pinv(X_map)
pinvy_map = sp.polyval(sp.flipud(sp.dot(Xmapdag,y)), px_map)
X1 = sp.zeros((x_map.shape,11),dtype=float) # for fitting
X2 = sp.zeros((px_map.shape,11),dtype=float) # for plotting
X1[:,0] = 1.0
X2[:,0] = 1.0
portho = 
for i in xrange(11):
for i in xrange(11):
X1[:,i] = portho[i](x_map)
X2[:,i] = portho[i](px_map)
X1dag = pinv(X1)
beta1_pinv = sp.dot(X1dag, y)
pinvy_ortho = sp.dot(X2, beta1_pinv)
# do it the wrong way with X1
b_legendre = sp.dot(sp.transpose(X1),y)
beta_legendre = solve(sp.dot(sp.transpose(X1),X1), b_legendre)
py_legendre = sp.dot(X2, beta_legendre)
LU,piv = lu_factor(sp.dot(sp.transpose(X1),X1))
beta_legendre_lu = lu_solve((LU,piv),b_legendre)
py_legendre_lu = sp.dot(X2, beta_legendre_lu)
The error norm for all of the “good” methods is roughly the same order of magnitude. Even using a bad algorithm, inverting the normal equations, gives a good answer if you start out with a design matrix that is orthogonal to begin with. The QR method is much less analyst-work (and works even without mapping or normalizing the data). Since you just build the basis numerically it requires a bit more cpu-work (which is usually cheap), and it is already coded up and well tested in freely available libraries.