## Wednesday, December 24, 2008

### More on Speeding Up Octave

In a previous post we found that speeding up Octave amounted mostly to avoiding for loops. The simplest way to do that is to operate directly on vectors using the built-in operators and functions (which are fast because they are compiled) or cast your problem as a sparse matrix solve. This second option is especially helpful when subsequent calculations depend on previous iterations (the code can't really be vectorized). This option is a very general way of avoiding loops in those cases.

The simple example given previously was a numerical integration of x-squared:

p1(1) = 0;
for( i=2:N )
t = t + dt;
p1(i) = p1( i-1 ) + dt*2*t;
endfor

The way to recast the problem as a sparse matrix solve is to think of p1 as the vector of unknowns, and each iteration of the loop as an equation in the system we want to solve. Writing down the system gives us: The important detail to remember is to use the functions in Octave to allocate the sparse matrix, or you could easily find yourself writing more really slow for-loops just to create the sparse matrix which you hoped would save lots of time by avoiding for-loops. Talk about the long, slow way around!

Two very useful functions are speye() and spdiag(). The first returns a sparse identity matrix, which is often a good initial building block for many useful operators. The second allows you to place vectors (allocated quickly with the usual vector suspects such as ones() and zeros() ) along the diagonals of the sparse matrix.

# create the main diagonal
A = speye( N );
# alternatively could use spdiag:
# A = spdiag( ones(N,1), 0 );
A = A + spdiag( -ones(N-1,1), -1 ); # add the first sub-diagonal

If your operator isn't banded then you'll need to use spconvert(), which takes as its argument a three (or four if you need a complex result) column matrix. Each row of the argument defines a non-zero entry in the sparse matrix. The first column is the row index, the second column is the column index and the third column is the entry value (fourth column being the imaginary part of the value, if necessary).

Using those three functions should allow you to allocate a sparse matrix in Octave without resorting to for loops (which was why we embarked on this journey to begin with).

This method is counter-intuitive to folks who come from a Fortran (or other compiled language) background, because writing down the loop is the simple, efficient way to solve the problem. It also seems like 'wasting' memory to store all of those redundant coefficients. The timing results speak for themselves though, if you want to stay completely in Octave (or Matlab or Python) sometimes you have to do weird things to get reasonable performance. Of course, if speed really becomes a problem then the inner loops of your calculations need to move to a compiled language that can then be called from Octave, this is a bit more complicated than our little sparse matrix method.