Thursday, April 2, 2009

Performance optimization (allocation inside a for loop)

An interesting discussion about an old topic just popped up on the Octave lists. It's not interesting because someone noticed that failing to pre-allocate a vector before entering a loop is slow (doesn't anyone RTFM?), but because of the follow-up discussion on indexing and range objects.

Here's the interesting bit:

octave:1> tic(); n=1e5; retval=1:n; toc()
Elapsed time is 0.000528962 seconds.
octave:2> tic(); n=1e5; retval = (1:n)(1:n); toc
Elapsed time is 0.00593709 seconds.
octave:3> tic();n=1e5;retval=[1:n]; toc
Elapsed time is 0.010952 seconds.


Why the significant difference in performance? According to jwe:

In Octave, an expression like 1:n creates a range object, which contains only the base, limit, and increment as double precision values, so no matter how many elements are in the range, it only takes a few bytes of storage (24 for the data plus some overhead for the internal octave_value object itself).

If you write [1:n], you force a Matrix object with N elements to be created. It will require 8*N bytes of storage, plus the overhead for the internal octave_value object itself.


Similar to the difference between range() and xrange() in Python.

No comments:

Post a Comment