## Monday, October 19, 2009

### JASSM Bootstrap Reliability II

I previously did an analysis of JASSM's reliability using Octave. It's a very simple exercise using the built-in Octave function empirical_rand() and public statements by various folks concerning the flight tests.

As reported by Reuters, JASSM has had success in recent tests. To be quantitative, 15 out of 16 test flights were a success. Certainly good news for a program that's been bedevilled by flight test failures and Nunn-McCurdy breaches.

The previous analysis can be extended to include this new information. This time we'll use Python rather than Octave. There is not (that I know of) a built-in function like empirical_rand() in any of Python's libraries, but it's relatively straightforward to use random integers to index into arrays and accomplish the same thing.

def bootstrap(data, nboot):
"""Draw nboot samples from the data array randomly with replacement, ie
a bootstrap sample."""
bootsample = sp.random.random_integers
return(data[bootsample(0, len(data)-1, (len(data), nboot))])

Applying this function to our new data vectors to get bootstrap samples is easy.

# reliability data for July 2009, based on Reuters report
d09_jul_tests = 19
d09_jul = sp.ones(d09_jul_tests, dtype=float)
d09_jul[0:3] = 0.0

# generate the bootstrap samples:
d09_jul_boot = sp.sum(bootstrap(d09_jul, nboot), axis=0) / float(d09_jul_tests)
# find the number of unique reliabilities to set the number of bins for the histogram:
d09_jul_nbins = sp.unique(d09_jul_boot).size

So, is it a 0.9 missile or a 0.8 missile? We should probably be a little more modest in the inferences we wish to draw from such small samples. As reported by Bloomberg the Air Force publicly contemplated cancellation for a missile with reliability distributions like the blue histogram shown below (six of ten failed), while stating that if 13 of 16 were successful (the green histogram) that would be acceptable. The figure below shows these two reliability distributions along with the actual recent performance.

This seems to be a reasonably supported inference from these sample sizes, the fixes that resulted in the recent 15 out of 16 successes had a measurable effect when compared to the 4 out of 10 performance.

Not that binary outcomes for reliability are a great measure, they just happen to be easily scrape-able from press releases. The figure below illustrates the problem, there just isn't much info in each 1 or 0, so really cranking up the number of samples only slowly improves the power (reduces the area of overlap between the two distributions).