Octave has a great built-in function called
nchoosekwhich makes shuffling a breeze. Called with scalar arguments it returns the value of the binomial coefficient, which is the number of ways you can choose k things from n things (k<=n). For fixed n, nchoosek is maximum when k=n/2. That is plotted below for n=2:24.
Notice that's a log-log scale, as n increases it quickly becomes intractable to perform a complete permutation test.
Suppose we have two data vectors, and we want to know if they are from populations with different means. We can use the
normrnd()function to draw 8 samples from a normal distribution with 0 mean and 8 samples from a normal distribution with a mean of 1.
x1 = normrnd(0,1,n,1);
x2 = normrnd(1,1,n,1);
perms = nchoosek( [x1;x2], n );
We stored all 12870 permutations of the vectors (16 choose 8) in the matrix
perms. Now we use the built-in function
complementto find the difference of the means under each of those labellings.
comps(i,:) = complement( perms(i,:), [x1;x2] );
m1 = mean( perms, 2 ); % take row-wise averages, DIM=2
m2 = mean( comps, 2 ); % take row-wise averages, DIM=2
m_shuff = m1 - m2;
Now we can use the distribution of
m_shuffto decide if the difference in the means of our two data vectors is significant.
Octave script with all of these calculations: shuffle.m.