Sunday, February 28, 2010

Uncertainty Quantification with Stochastic Collocation

I’ve been posting lots of links to uncertainty quantification (UQ) references lately in comments (eg here, here, here), but I don’t really have a post dedicated only to UQ. So I figured I’d remedy that situaton. This post will be a simple example problem based on methods presented in some useful recent papers on UQ (along with a little twist of my own):

  • A good overview of several of the related UQ methods [1]
  • Discussion of some of the sampling approaches for high dimensional random spaces, along with specifics about the method for CFD [2]
  • Comparison of speed-up over naive Monte Carlo sampling approaches [3]
  • A worked example of stochastic collocation for a simple ocean model [4]

The example will be based on a nonlinear function of a single variable

 u f = u e
(1)

The first step is to define a probability distribution for our input (Gaussian in this case, shown in Figure 1).

 2 ∂p e- (u2-σμ2) ---= √------- ∂u 2 π σ
(2)


PIC

Figure 1: Uncertain Input Distribution


Then, since our example is a simple function, we can analytically calculate the resulting probability density for the output. To do that we need the derivative of the model with respect to the input (in the multi-variate case we’d need the Jacobian).

∂f ---= u eu + eu ∂u
(3)

Then we divide the input density by the slope to get the new output density.

 (u--μ)2 ∂p- ∂p-∂f--1 -----e--2σ2------- ∂f = ∂u ∂u = √2-π σ (ueu + eu)
(4)

We’ll use equation 4 to measure the convergence of our collocation method.

The idea of stochastic collocation methods is that the points which the model (equation 1 in our example) is evaluated at are chosen so that they are orthogonal with the probability distributions on the inputs as a weighting function. For normally distributed inputs this results in choosing points at the roots of the Hermite polynomials. Luckily, Scipy has a nice set of special functions, which includes he_roots to give us the roots of the Hermite polynomials. What we really want is not the function evaluation at the collocation points, but the slope at the collocation points. One way to get this information in a nearly non-intrusive way is to use a complex step method (this is that little twist I mentioned). So, once we have values for this slope at the collocation points we fit a polynomial to those points and use this polynomial as a surrogate for equation 3. Figure 2 shows the results for several orders of surrogate slopes.


PIC

Figure 2: Stochastic Collocation Estimates of Output Uncertainty


The convergence of the method is shown in Figure 3. For low dimensional random spaces (only a few random inputs) this method will be much more efficient than random sampling. Eventually the curse of dimensionality takes its toll though, and random sampling becomes the faster approach.


PIC

Figure 3: Convergence of the Collocation Method


The cool thing about this method is that the complex step method which I used to estimate the Jacobian is also useful for sensitivity quantification which can be used for design optimization as well. On a related note, Dan Hughes has a post describing an interesting approach to bounding uncertainties in the output of a calculation given bounds on the inputs by using interval arithmetic.

References

[1]   Loeven, G.J.A., Witteveen, J.A.S., Bijl, H., Probabilistic Collocation: An Efficient Non-Intrusive Approach For Arbitrarily Distributed Parametric Uncertainties, 45th Aerospace Sciences Meeting, Reno, NV, 2007.

[2]   Hosder, S., Walters, R.W., Non-Intrusive Polynomial Chaos Methods for Uncertainty Quantification in Fluid Dynamics, 48th Aerospace Sciences Meeting, Orlando, FL, 2010.

[3]   Xiu, D., Fast Numerical Methods for Stochastic Computations, Comm. Comp. Phys., 2009.

[4]   Webster, M., Tatang, M.A., McRae, G.J., Application of the Probabilistic Collocation Method for an Uncertainty Analysis of a Simple Ocean Model, MIT JPSPGC Report 4, 2000.

Saturday, February 27, 2010

Catastrophic Sea-level Rise

Sea level may rise a meter or two (more likely about half a meter) over the next century. Better get out of the way.

For those of you who have actually lived in areas that were at risk of significant storm surge you'll understand the ludicrous nature of the 'catastrophic sea level rise' claims bandied about by enviro-alarmists. A couple centimeters a year? Really? That's a catastrophe? Try eight meters in a matter of hours, wimps.

[Update: (I feel a bit sheepish for the lack of any intellectual content in this post, I'll blame it on the nice glass of Malbec I had the evening I posted it.) Here's an interesting paper that talks about hysteresis effects in ice-sheet dynamics. They present a graph showing change in ice volume in starting from no ice in a cooling climate, to change in ice volume starting with lots of ice in a warming climate.
And here's a section from their conclusions (selectively edited with added emphasis to prove a point):
In our model with orbital forcing, CO2 levels need to rise slightly above 3xPAL [pre-industrial levels] to induce drastic retreat of pre-existing EAIS... however our current coupling method does not fully capture terrestrial ice-albedo feedback, which can affect Antarctic temperatures and precipitation just as much as orbital and CO2 changes... we have found that including [those effects] increases hysteresis... so that transitions occur at 3x and 4x PAL CO2, and are more sudden than those in Fig. 2a.
Oh my, it's worse than even the scientists thought... There's certainly a good press release lurking in that paragraph of the conclusion, and those transitions in the figure look down-right abrupt, we could be unwittingly teetering at the edge of a precipice. Here's another interesting part of the conclusion:
Due to the mass inertia of the ice sheet, climate fluctuations out of equilibrium with current ice must be maintained for several thousand years to have an appreciable effect on East Antarctic ice volume, and to induce transitions such as those described above. [...] However, our paleo-climatic simulations are not directly applilcable to the next ~5000 years, since the time scales of CO2 change are very different, and since we include full orbital variations. [...] In preliminary experiments [...] we find that CO2 levels must exceed 8xPAL to induce significant retreat from full ice-sheet conditions.
Oh, well, that's probably not going to make anyone's breathless climate-watch press release. Also, consider the scale on that graph, big-ticks at 10^6 years, those drastic and abrupt transitions are happening at time-scales on the order of the length of all recorded human history.]

Tuesday, February 23, 2010

Open Thread for Externalities and Establishing Causation Discussion

This post is just a place to collect spill-over from a discussion on another site about another topic.

Jesse February 23, 2010 at 12:41 pm
JStults,

I will pick up the debate on your blog then.  I have read the linked article and the information is still not overwhelming to the free-market system.  Have you read much of Walter Block’s work in this area? This is a very brief paper that describes the basics of the free market approach and the ways that government intervention have distorted the market.  http://www.walterblock.com/wp-content/uploads/publications/misallocations_externalities.pdf
 

jstults said...
Regulatory Models and the Environment: Practice, Pitfalls, and Prospects
Modeling is a difficult enterprise even outside
of the potentially adversarial regulatory environment. The demands grow when the regulatory requirements for accountability, transparency, public accessibility, and technical rigor are added to the challenges. Moreover, models cannot be validated (declared true) but instead should be evaluated with regard to their suitability as tools to address a specific question. The committee concluded that these characteristics make evaluation of a regulatory model more complex than
simply comparing measurement data with model results. Evaluation also must balance the need for a model to be accurate with the need for a model to be reproducible, transparent, and useful for the regulatory decision at hand. Meeting these needs requires model evaluation to be applied over the “life cycle” of a regulatory model with an approach that includes different forms of peer review, uncertainty analysis, and extrapolation methods than for non-regulatory models.


Their choice of terminology is unfortunate, validated generally doesn't mean true, it means understanding the degree to which a model is a suitable representation of the real world.

Saturday, February 20, 2010

Validation and Credibility

Outside of certain small technical communities the importance and purpose of VV&UQ for computational physics is little understood. Normally this unawareness of an arcane technical practice would not matter much, but since many modern policy decisions are being based, in-part if not in-whole, on the output of large simulations it becomes a matter that demands broader understanding. The fundamental reason for that entire effort (and it takes a lot of effort to get right) is to establish the credibility of our simulation outputs so that we can use them to support rational decision making. The first paragraph of the ASC fact-sheet on the subject (as applied to nuclear stock-pile stewardship) sums things up well (emphasis mine):

Why:

The Advanced Simulation and Computing (ASC) Verification & Validation (V&V) program exists to establish a technically rigorous foundation of credibility for the computational science and engineering calculations required by the NNSA Stockpile Stewardship Program. This program emphasizes the development and implementation of science-based verification and validation methods for the support of high-consequence decisions regarding the management of the U.S. nuclear stockpile. The V&V process reduces the risk of incorrect stockpile decisions by establishing that the calculations provide the right answers for the right reasons.
-- Verification and Validation: Credibility in Stockpile Modeling and Simulation
Microscopic Description of the Fission Process is an interesting set of slides dealing with this credibility building exercise where empirical closure is concerned. The question addressed on slide 3 is, "Why focus on the microscopic nuclear theories?" Here are the reasons given:
  • The nuclear many-body problem is very complex, computationally difficult
  • Much of the progress in the past 50 years has been based on empirical models (most with microscopic degrees of freedom) tuned to experimental data

    • This highly limits our predictive capability
    • And it is difficult to estimate the uncertainties

      • How can we do Quantification of Margins and Uncertainties?
With a fundamental picture of nuclei based on the correct microphysics, we can remove the empiricism inherent today, thereby giving us greater confidence in the science we deliver to the programs
This illustrates the idea that the credibility of validation has more to do with the quality of the model's physical and theoretical basis as opposed to the coverage or quantity of experimental data (though validation experiments are still quite necessary). Part of the goal of any continuing validation process (link by way of Dan Hughes' site) is to reach the point where all of the empirical closures (constitutive relations or sub-grid scale parameterizations in the context of large multi-physics codes) have sound theoretical and physical basis.

That fact sheet mentioned above also has a good list of the sorts of deliverables that we should expect as consumers of simulation-based decision support:
  • Documented analysis and conclusion of the confidence level of the models as a result of the V&V activities.
  • Repository of test results associated with unit / regression / system tests, verification and validation tests and/or list of test data used.
  • Documented code (feature) and solution (model) verification.
  • Documented V&V environment (constraints, assumptions, and tools).
  • Repository of code versions, fixes, and other patches used during the V&V phase.
The availability and open accessibility of these things is critical to building the credibility of, and consensus around decision-support products. This is because there is little 'fact-checking' that decision makers can do without access to their own supercomputers and experts for re-running or re-implementing the simulations. In this situation of high barriers to checking or independent confirmation an open notebook documenting a formal and rigorous process is crucial.

The entire point of requiring the VV&UQ process is not that we hope to prove a model implementation is correct or true (an impossible task), but that we understand the importance of generating credible results that faithfully represent the reality of interest through sound physical reasoning, and which are shown to be useful for a particular purpose. VV&UQ is the set of best practices for providing unimpeachable, science-based support for decisions that matter.

Friday, February 19, 2010

Visualizing Confidence Intervals

This post is about visualizing confidence intervals. Matplotlib has some really neat capabilities that are useful in that regard, but before we get into the pictures, here’s a short digression on statistics.

I like resampling-based statistics a lot, as an engineer their practicality and intuitiveness appeals to me. It has been shown that students learn better and enjoy statistics more when they are taught using resampling methods. Here’s a nice description of the motivation for resampling (ok, maybe it's a bit over the top):

For more than a century the inherent difficulty of formula-based inferential statistics has baffled scientists, induced errors in research, and caused million of students to hate the subject.

Complexity is the disease. Resampling (drawing repeated samples from the given data, or population suggested by the data) is a proven cure. Bootstrap, permutation, and other computer-intensive procedures have revolutionized statistics. Resampling is now the method of choice for confidence limits, hypothesis tests, and other everyday inferential problems.

In place of the formidable formulas and mysterious tables of parametric and non-parametric tests based on complicated mathematics and arcane approximations, the basic resampling tools are simulations, created especially for the task at hand by practitioners who completely understand what they are doing and why they are doing it. Resampling lets you analyze most sorts of data, even those that cannot be analyzed with formulas.

– Resampling Stats

One of the useful things to do is resampling of residuals. The assumptions underlying most models are that the magnitude and sign of the residuals are not a function of the independent variable. This is the hypothesis which we’ll base our resampling on. First we fit a model (a line in these examples) to some data, then calculate all the residuals (difference between the data and the model). Then we can apply a couple of different resampling approaches towards understanding the confidence intervals.

A permutation-based way of establishing confidence intervals is easily accomplished in Python using the itertools module’s permutation function. This is an exact method, but the number of permutations grows as the factorial of the sample size. The six-sample example shown in figure 1 has 6! = 720 possible permutations of the residuals. With only ten samples the number of permutations grows to 3628800 (over three million). The point of this post is visualization, so plotting three million lines may not be worth our while.


PIC

Figure 1: Residual Permutations


Figure 1 shows the least-squares-fit line in solid black, with the lines fit using the permuted residuals slightly transparent. Here’s the Python to accomplish that:

 
nx = 6 
x = sp.linspace(0.0, 1.0, nx) 
data = x + norm.rvs(loc=0.0, scale=0.1, size=nx) 
yp = sp.polyfit(x, data, 1) 
y = sp.polyval(yp,x) 
r = data - y 
 
nperm = factorial(nx) 
for perm in permutations(r,nx): # loop over all the permutations of the resids 
    pc = sp.polyfit(x, y + perm, 1) 
    p.plot(x, sp.polyval(pc,x), k-, linewidth=2, alpha=2.0/float(nperm)) 
p.plot(x, y, k-) 
p.plot(x, data, ko)

We’ve used the alpha argument to set the level of transparency in each of the permutations. Where more of the lines overlap, the color is darker. This gives a somewhat intuitive display of the ’weight’ of the variation in the model fit (confidence).

As mentioned above, doing exact permutation-based statistics becomes computationally prohibitive rather quickly, so we have to go to the random-sampling based approximations. The bootstrap is one such approach.

Rather than generating all possible permutations of the residuals, the bootstrap method involves drawing random samples (with replacement) from the residuals. This is done in Python easily enough by using the random_integer function to index into our array of residuals.

 
import scipy as sp
 
bootindex = sp.random.random_integers
 
nboot = 400 
for i in xrange(nboot): # loop over n bootstrap samples from the resids 
    pc = sp.polyfit(x, y + r[bootindex(0, len(r)-1, len(r))], 1) 
    p.plot(x, sp.polyval(pc,x), k-, linewidth=2, alpha=3.0/float(nboot)) 
p.plot(x, y, k-) 
p.plot(x, data, ko)

Figure 2 shows the method applied to the same six data points as with the permutation.


PIC
Figure 2: Residual Bootstraps


Since the sampling in the bootstrap approach is with replacement we can get sets of residuals that have repeated values. This tends to introduce a bit of bias in the direction of that repeated residual. You can see that behaviour exhibited in Figure 2 by the spreading of the lines around the middle of the graph, in contrast to the tight intersection in the middle of Figure 1.

Of course the reason the bootstrap method is useful is we often have large samples that are impractical to do with permutations. The Python to generate the 12-sample example in Figure 3 is shown below.

 
nx = 12 
x = sp.linspace(0.0, 1.0, nx) 
data = x + norm.rvs(loc=0.0, scale=0.1, size=nx) 
yp = sp.polyfit(x, data, 1) 
y = sp.polyval(yp,x) 
r = data - y 
p.figure() 
for i in xrange(nboot): # loop over n bootstrap samples from the resids 
    pc = sp.polyfit(x, y + r[bootindex(0, len(r)-1, len(r))], 1) 
    p.plot(x, sp.polyval(pc,x), k-, linewidth=2, alpha=3.0/float(nboot)) 
p.plot(x, y, k-) 
p.plot(x, data, ko)


PIC
Figure 3: Residual Bootstraps, large sample


These sorts of graphs probably won’t replace the standard sorts of confidence intervals (see the graph in this post for example), but it’s a kind of neat way of looking at things, and a good demo of some of the cool stuff you can do really easily in Python with Matplotlib and Scipy.

Saturday, February 13, 2010

Climate Change: It's Always Worse Than We Thought

It’s always worse than we thought, and there’s a sound scientific reason for it too. It has to do with measurement precision (or parameter fitting if you’d like). First, what do I mean by “worse than we thought”? Just the familiar refrain in the popular press, the steady drum-beat of stories about how scientists thought things were bad, but this latest research shows it’s really bad. The latest research gets its own little press release or mention in the popular science news and then is quickly forgotten. Because it was, like most research, just an incremental improvement on our understanding rather than an earth shattering discovery. In fact, earth shattering discoveries generally don’t get good press (maybe they will in our modern, networked age, but we just haven’t had any yet so we don’t know). Google really is great for getting a feel for this sort of phenomena; click on the graph to go to the Google News Archive results for “climate change it’s worse than we thought” between 1960 and 2009.


Figure 1: Google News Archive Histogram


To go by the news archive, apparently 2008 and 2009 were slightly less worse than we thought, or perhaps 2007 was an abnormally worse-than-we-thought year.

If you are a thoughtful citizen, trying to be well-informed, then you might be forgiven for thinking that we’re all about to stew in our own juices or be run over by category 6 super-storms any minute. The steady worse-than-we-thought rhythm of the popular press would mark the time in your attempts to stay informed and engaged. This is actually a topic covered fairly well over on RealClimate (their stealth advocacy is disturbing at times, but they do talk sense on occasion).

When those reporters writing those news stories represented in the archive write the ’worse than we thought’ copy, they are not lying. So what is going on? This is where measurement precision comes in. Each individual piece of research is usually focused on one method or technique for measuring or estimating a quantity (whether it is arctic sea ice extent, global mean surface temperature, the cost of climate change on cotton in Calcutta, etc.). The precision of that individual technique is usually not that good on its own, so you get pretty broad probability distributions for your estimates of the thing you are trying to measure. This leads to the sorts of claims like, “new research gives much more credence to huge climate sensitivity to CO2 emission.” These claims are not mistaken on their own, but the proper way to evaluate them is in the context of all the other ways we have of measuring that same thing. Figure 2 shows a couple “measurement” distributions with large spread (or entropy) along with a distribution you’d get from combining them (assuming they were independent measurements).


PIC

Figure 2: Combining uncertain measurements


Notice that the combined estimate gives much lower probability to extremes that many of the individual estimates give a fairly significant probability. James Annan has done some work on exactly that sort of “evaluation in context” for climate sensitivity to a doubling of CO2. This post over on the AirVent is sort of similar, in that it tries to combine satellite and surface station temperature data into a single “picture” of what’s going on.

Unfortunately in the climate policy debate what many alarmists (some might call them believers) claim as the weight of scientific evidence is actually just their prior inclination towards a certain way of thinking, fed by the popular press coverage of the issue, and what many deniers claim is the overwhelming weight of scientific chicanery is actually much the same thing. Jaynes understood this effect in terms of people being approximately Bayesian in their belief updating. Jaynes claimed that the way to combat a divergence of views was to be as open and transparent with methods, process and data as possible (and he claimed science had this sort of approach built-in, hence the ability of scientist to come to consensus more quickly than the public). Jeff Id puts it well, “The correct way to talk to a skeptic is to EXPLAIN YOUR POSITION AND GIVE CLEAR ANSWERS. Thats it.” I think Jaynes, and most reasonable folks, would agree.

Thursday, February 11, 2010

Google Alerts is Worse than Crack

I have lots of Google Alerts set up for technical jargon kinds of phrases related to topics I'm interested in. Mostly I end up getting an Inbox full of advertising and political bloviating (it's amazing how many technical phrases the policy wonks have co-opted), but every now and then I get some real gems (and I try to share them on this site). I feel like the rat in the cage pushing his lever to get that reward. I'm on a random schedule of reinforcement, so it is addictive enough that I don't want to turn them off.

A good example of one of my alerts phrases is topological optimization of structures -"search engine optimization". Notice the all important inclusion of the minus operator. Try it, pretty much any kind of search phrase you try that includes 'optimization' is going to be swamped by SEO tools (I use that term in both senses of the word).

This is one of my phrases that hasn't paid off much in any way lately, but today I got a link to some summaries of recent talks, that included the phrase computational morphodynamics: local dynamics of form in biology and technology. Now that's new terminology for me, and it's a pretty useful concept / turn of phrase (yes, that phrase is probably going to turn into another Inbox-filling, random-reinforcing monkey on my back, such is life).

Wednesday, February 10, 2010

Mathworks Fighting the GPL

From the Octave mailing list:
fromJordi Gutiérrez Hermoso
toOctave-help
dateWed, Feb 10, 2010 at 5:55 PM
subject[OT]: Mathworks fighting the GPL
mailing list Filter messages from this mailing list
unsubscribeUnsubscribe from this mailing-list


Apologies for the offtopic chatter, but I just noticed this as I was
trying to look for Emacs's Matlab mode to see if I could get some
ideas for Octave mode. It appears that the Mathworks only allows the
BSD license on their servers now, which affects much free software
under other licenses (in particular GPL) that they were hosting.

It's difficult to find exactly what happened and how because they seem
to have spread the word about this relatively secretly by email to
people with code on their servers, but they do have a FAQ:

    http://www.mathworks.com/matlabcentral/FX_transition_faq.html

This happened July last year.

I don't know about anyone else, but I'm deeply disturbed that
Mathworks is now actively fighting against copyleft.

- Jordi G. H.
This seems to be so they can transition to a  more 'cloud-like' model.  From the FAQ:

Why is the File Exchange adding licensing?

Licensing clarifies the rights you have as an author and as a user of the code available on the File Exchange. Licensing details how the file can be used and addresses common questions around rights to modification, distribution, and commercial use.

What happens if I don't do anything? Do I have to put a license on my code?

We have no plans to remove unlicensed submissions, but they will be prominently marked as unlicensed. In addition, unlicensed contributions will not be available for use with any future tools that access the File Exchange from within MathWorks products.
Well being able to download and use code on the Exchange 'automagically' without worrying about license restrictions is probably a good thing.  Of course this also means MathWorks can then ship binaries based on your code without sharing the source.  BSD vs. GPL: which is freedomier, flame on!

Tuesday, February 9, 2010

A Few Nits About Ensembles and Decision Support

First off, thanks to Steve Easterbrook for pointing at a new report on ensemble climate predictions. This post is some of my thoughts on that report.

Here's an interesting snippet from the first section of the report:
Within the last decade the causal link between increasing concentrations of anthropogenic greenhouse gases in the atmosphere and the observed changes in temperature has been scientifically established.
From a nice little summary of how to establish causation:
C. Establishing causation: The best method for establishing causation is an experiment, but many times that is not ethically or practically possible (e.g., smoking and cancer, education and earnings). The main strategy for learning about causation when we can’t do an experiment is to consider all lurking variables you can think of and look at how Y is associated with X when the lurking variables are held “fixed.”

D. Criteria for establishing causation without an experiment: The following criteria make causation more credible when we cannot do an experiment.
(i) The association is strong.
(ii) The association is consistent.
(iii) Higher doses are associated with stronger responses.
(iv) The alleged cause precedes the effect in time.
(v) The alleged cause is plausible.
The point being, if you can't do experiments, the causal link you establish will always be a rather contingent one (and if your population of 'lurkers' is only 4 or 5 then perhaps we aren't to the point of exhausting our imaginations yet).  I say that not to be disingenuous and sow doubt unnecessarily, but merely to show that I have an honest place to stand in my skepticism (I'd like to head off the "you're a willfully ignorant pseudo-scientific jerk" sorts of flames that seem to be popular in discussions on this topic).

That little digression aside, the specific aim of the work is to
develop an ensemble prediction system for climate change based on the principal state-of-the-art, high-resolution, global and regional Earth system models developed in Europe, validated against quality-controlled, high-resolution gridded datasets for Europe, to produce for the first time an objective probabilistic estimate of uncertainty in future climate at the seasonal to decadal and longer time-scales;
A side-note on climate alarmism: If the quality of the body of knowledge was such that it demanded action NOW! Then this would not have been the first such study. Rational decision support requires these sorts of uncertainty quantification efforts, it is totally irresponsible to demand political action without them.
The improvements for example, add skill to seasonal forecasting while multi-decadal models, for the first time, have produced probabilistic climate change projections for Europe.
Again, now that these projections have been made for the first time, we could actually attempt to validate them. I use that term in the technical sense of comparing a model's predictions to innovative experimental results. Since we can't do experiments on the earth (or can we?) we have to settle for either not validating, or validating by comparing the predictions to what actually happens. I realize that would take a couple decades to make useful quantitative comparisons. But think about this, if we've already bought a millenia of warming then can't we spend a decade or two to build the credibility in our tool-set which we'll be using to 'fly' the climate into the future for centuries to come? The fact that this set of ensemble results claims to be skillfull at the decadal time-scales would actually make the validation task a quicker one than it would be with less accurate models because you've taken some of the 'noise' and explained it with your model.

I got really excited about this, it sounds promising:
The multi-model ensemble builds on the experience of previous projects where it has been shown to be a successful method to improve the skill of seasonal forecasts from individual models. The perturbed parameter approach reflects uncertainty in physical model parameters, while the newly developed stochastic physics methodology represents uncertainty due to inherent errors in model parameterisations and to the unavoidably finite resolution of the models.
Then I got to this:
These results illustrate that initialised decadal forecasts have the potential to provide improved information compared with traditional climate change projections, but the optimal strategy for building improved decadal prediction systems in the presence of model biases remains an open question for future work.
Which reflects my impression of the state of the art from my little mini-lit review on Bayes Model Averaging. That's the fundamental difficulty, isn't it?

This is the sort of thing that worries folks who are used to being able to draw a bright line between calibration and validation:
The ENSEMBLES gridded observation data set was used along with other datasets to verify and calibrate both global and regional models, and also to assess the uncertainties in model response to anthropogenic forcing.

Recall Kelvin,
Conditional PDFs, which encompass the sampled uncertainty, were constructed from the statistically and dynamically downscaled output (and from GCM output) for temperature and/or precipitation for a number of areas and points. These are, however, qualitative constructions.
There's still some work to do here to make the product more useful for decision makers (quantitative rather than qualitative). I think the polynomial chaos expansion approaches being explored in the uncertainty quantification community have a lot of promise here (3 or 4 orders of magnitude speed-up over standard Monte Carlo approaches). The other slight difficulty with this approach is that these qualitative PDFs were then used as inputs into the impact assessments.

In light of the previous ensembles post and linked discussion thread, I found this snippet interesting:
The non-linear nature of the climate system makes dynamical climate forecasts sensitive to uncertainty in both the initial state and the model used for their formulation. Uncertainties in the initial conditions are accounted for by generating an ensemble from slightly different atmospheric and ocean analyses. Uncertainty in model formulation arises due to the inability of dynamical models of climate to simulate every single aspect of the climate system with arbitrary detail. Climate models have limited spatial and temporal resolution, so that physical processes that are active at smaller scales (e.g., convection, orographic wave drag, cloud physics, mixing) must be parameterised using semi-empirical relationships.

In that Adventures Among Alarmists post I made a sort of hand-wavy claim of everything about forecasting (from data assimilation on through to predictive distributions) being one big ill-posed problem with noise. I think reading section 3 of this report will give you a flavor of what I mean. One minor quibble: hindcasts are an ok sort of 'sanity' check, but we should take care to remember their dangers, and not mistake them for true validation. Taking too much confidence from hind-casts is a recipe for fooling ourselves.

The ensemble's skill changes with lead-time:
The skill increases for longer lead times, being larger for 6–10 years ahead than for 3–14 months or 2–5 years ahead. This is because the forced climate change signal, the sign of which is highly predictable, is greater at longer lead times.
This squares with the results discussed over here about BMA in climate and weather forecasting. Depending on the time-frame at which you are looking to forecast, different model weightings, and spin-up times are optimal. This also goes to the problem about the past-performance / future-skill connection. Since the feedbacks are not stationary, models which performed well in the past won't tend to perform well in the future (eg. the BMA weighting changes through time).
Encouragingly, the multi-model ensemble mean, which consists of the average of twelve individual projections, gives somewhat higher scores than any of the individual models, whose projections are derived from three members with perturbed initial conditions.
So, truth-centered or not?

The results also show that the skill increases for more recent hindcasts. In order to diagnose sources of skill, the blue curve of Figure 3.6 shows ensemble mean results from a parallel ensemble of ‘NoAssim’ hindcasts containing the same external forcing from greenhouse gases, sulphate aerosols, volcanoes and solar variations, but initialised from randomly selected model states rather than analyses of observations.
This seems like a reasonable use of hindcasts. Look for insight into the reasons particular models / realizations might have performed well on certain historical periods. They also found that initialization matters even with climate predictions (though their randomly selected initializations were pretty darn skill-full).

The product, decision support:
For many grid boxes there are significant probabilities of both drier and wetter future climates, and this may be important for impacts studies.
I think as regional projections start becoming more and more available, the extreme, alarmist policy prescriptions will be less and less well supported by a rational cost-benefit analysis.

The sensitivity study discussed in the report (done by climateprediction.net) is worth noting simply for the fact that they found an interesting interaction (that's always a fun part of experimentation). Some of the criticism of this effort has focused on the plausibility of some of the parameter combinations in the thousands of runs of this computer experiment. The distinction to keep in mind is that this is a sensitivity study rather than a complete uncertainty quantification study.

Well it said in the executive summary that they generated 'qualitative PDFs', but it seems like section 3.3.2 is describing a quantitative Bayesian approach. They sample their parameter space with a variety of models with varying levels of complexity and then fit a simple surrogate (some people might call it a response surface) so that they can get approximate 'results' for the whole space. Then they get posterior probabilities by weighting expert obtained priors by likelihoods, a straight-forward application of Bayes Theorem. That seems as quantitative as anyone could ask for, maybe I'm missing something?

They give a nice summary of the different types of uncertainties:
Also, the three techniques for sampling modelling uncertainty are essentially complementary to one another, so should not be seen as competing alternatives: the multi-model approach samples structural variations in model formulation, but does not systematically explore parameter uncertainties for a given set of structural choices, whereas the perturbed parameter approach does the reverse. The stochastic physics approach recognises the uncertainty inherent in inferring the effects of parameterised processes from grid box average variables which cannot account for unresolved sub-grid-scale organisations in the modelled flow, whereas the other methods do not. There is likely to be scope to develop better prediction systems in future by combining aspects of the separate systems considered in ENSEMBLES.

I think section 3 was the meat of the report (at least for what I'm interested in), so I'll stop with the commentary on that report there. I want to end with an answer to the question often suggested for dealing with us ornery skeptics, "What evidence would it take to convince you?" This is usually meant to be a jab, because obviously skeptics are really deniers in disguise and we couldn't possibly be reasonable or consider evidence (and it also displays one of the common fallacies of regarding disagreement about policy with ignorance of science, science demands nothing but a method). My answer is a simple one: Rational policy tied to skillful prediction. By rational policy I mean it is foolish to focus on the cost-benefit of extreme events far out into the future weighed against mitigation today or tomorrow (because the tails of those future cost distributions are so uncertain, and the immediate costs of mitigation are relatively well known). It is far more rational to look at the near-term cost-benefit of adaption to climate changes (no matter their cause) and evolutionary improvements to our irrigation, flood management, public health and energy diversity problems. The skillfulness of near-term predictions can be reasonably validated, and then used to guide policy. This should be a natural extension of the way we already make agriculture and infrastructure decisions based on weather predictions and an understanding of our current climate. There's no need for the slashing and burning of evil western capitalism (or whatever the rallying cry at Copenhagen was). The ability to skillfully predict changes further and further into the future can be gradually validated (by making a prediction and then waiting, sorry that's what seems reasonable to me) and then the results of those tools can be incorporated into policy decisions. Markets and people don't respond well to shocks. Gradualism may not be sexy, but it's smart.

Sunday, February 7, 2010

Predictions and Entropy in Ensembles

This post is another installment in the Lorenz63 series. In it I’ll try to address a common confusion among folks trying to understand what uncertainty in initial conditions means for our ability to predict the future state of a system (in particular as it pertains to climate change). The common pedagogical technique is to describe weather prediction as an initial value problem (IVP) and climate prediction as a boundary value problem (BVP) (for example see Serendipity). I don’t think that happens to be a very good teaching technique (it seems too hand-wavy to me [update: better reasoning given down in this comment and this comment]). I think there are probably better approaches which would give more insight into the problems. I’m a learn by doing / example kind of guy, so that’s what this post will focus on: using the Lorenz ’63 system as a useful toy to give insight into the problem.
In fact, predictions of both climate and weather often use the same models which approximately solve the same initial-boundary value problem described by partial differential equations (PDE)s for the conservation of mass, momentum and energy (along with many parametrizations for physical and chemical processes in the atmosphere as well as sub-grid scale models of unresolved flow features). The real distinction is that climate researchers and weather forecasters care about different functions or statistics of the solutions over different time-scales. A weather forecast depends on providing time-accurate predictive distributions of the state of the atmosphere in the near-future. A climate prediction is trying to provide a predictive distribution of a time-averaged atmospheric state which is (hopefully) independent of time far enough into the future (there’s an implicit ergodic hypothesis here, which, as reader Tom Vonk points out, still requires some theoretical developments to justify for the PDEs we’re actually interested in).
Another concept I’d like to introduce before I show example results from the Lorenz ’63 system is the entropy of a probability distribution. This can be viewed as a measure of informativeness of the distribution. [update: this paper presents the idea in context of climatic predictions] So a very informative distribution would have low entropy, but an uninformative distribution would have maximum entropy. For weather forecasting we would like low entropy in our predictive distributions, because that means we have significant information about what is going to happen. For climate, the distribution itself is what we want, and we are actually in some sense looking for the maximum entropy distribution that is consistent with our constraints. Measuring the “distance” between distributions with different constraints is what climate forecasting is all about.
Now for the toying with the Lorenz63 system. The “ensembles” I’m running are just varying the initial condition in the x-component (using the same approach shown here). I’m also using two slightly different bx parameters in the forcing function. Figure 1 shows the x-component of the resulting trajectories.


PIC
(a)Forcing1


(b)Forcing2
Figure 1: X-component trajectory under different forcings

The analogy to the weather / climate difference is pretty well illustrated by these results. Up to t ~ 3 we could do some pretty decent “weather” prediction, the ensemble members stay close together. After that things diverge abruptly (exponential growth in initial differences), this illustrates the need to “spin-up” a model when you are interested in the climate distributions. After t ~ 10 we could probably start estimating the “climate” of the two different forcings (histograms for 10 < t < 12 shown in Figure 2).


PIC
Figure 2: Two different “climate” distributions

These trajectories illustrate another interesting aspect of deterministic chaos, our uncertainty in the future state does not increase monotonically, it will grow and shrink in time (for instance compare the spread in the ensemble members at t = 4 to that at t = 8 shown in Figure 1). The entropy in the distribution of the trajectories as a function of time is shown in Figure 3. A Python function to calculate this for my ensemble (stored in a 2d array) is shown below.

def ensemble_entropy(x): 
    # the ensemble runs accross the first dimension of x, the time 
    # runs accross the second, return entropy as a function of time 
    eps = 1e-16 
    ent = sp.zeros(x.shape[1], dtype=float) 
    bin_edges = sp.linspace( 
        min(x.ravel()), max(x.ravel()), int(sp.sqrt(x.shape[0]))) 
    for i in xrange(x.shape[1]): 
        # the histogram function returns a probability density with normed=True 
        p = sp.histogram(x[:,i], bins=bin_edges, normed=True) 
        # we would like a probability mass for each bin, so we need to 
        # multiply by the width of the bin: 
        dx = p[1][1] - p[1][0] 
        p = dx * p[0] 
        # normalize (its generally very close, this is probably 
        # unnecessary), and take care of zero p bins so we dont get 
        # NaNs in the log: 
        p = p / sum(p) + eps 
        ent[i] = -sum(p * sp.log(p)) 
    return(ent)

PIC
Figure 3: Trajectory-Distribution Entropy as a Function of Time

The entropy gives us a nice measure of the informativeness of our ensemble. In the initial stages (t < 4) we’ve got small entropy (we could make “weather” predictions here). There’s a significant spike around t = 4, and then we see the magnitude of entropy drop off for a bit (or a nat, ha-ha) around t = 8, which matches the eye-balling of the ensemble spread we did earlier.
That’s it for toy model results, now for some conclusions and opinions.
There are two honest concerns with climate forecasting (if you know of more let me hear about them, if you don’t think my concerns are honest, let me hear that too). First, are the things we can predict with climate modeling useful for planing mitigation and adaption policies? So many of the alarming predictions of costs and catastrophes attributed to climate change in the press (and even in the IPCC’s reports) depend on particular regional (rather than global) climate changes. I think an open research question is how the time averaging (and large time-steps) involved in calculating the long trajectories for estimating equilibrium climate distributions (setting aside the theoretical underpinning of this ergodic assumption) affect the accuracy of the predicted spatial variations and regional changes (it is after-all a PDE rather than an ordinary differential equations (ODE)). This seems to be an area of research that is just beginning. Also, the papers I’ve been able to find so far don’t seem to report any grid convergence index results for the solutions (please link to papers that do have this in the comments if you know of any, thanks). This is an important part of what Roache calls calculation verification (as opposed to the code verification demonstrated here).
Second, what about the details of the ’averaging window’ (in both space and time)? How useful to policy are the results of long time-averages? Are the equilibrium distributions things we will ever actually reach? These two concerns about the usefulness of the climate modeling product for policy makers (and in a Republic like mine, the public), and the details about the averaging, seem to be the motivation for Pielke Sr.’s quibble over in this thread about the definition of climate and the implications of chaos. As Pielke points out, take your averaging volume small enough, and things start looking pretty chaotic.
My personal view is that deterministic chaos is neat in toy problems (and a fun challenge for applying the method of manufactured solutions), but in the real world our uncertainties about everything (and the stochastic nature of many of the forcings) swamp the infinitesimals. The thing that makes the dynamics of the climate-policy-science “system” interesting is the tension between giving useful insight to decision makers and ensuring that insight is not overly sensitive to our inescapable uncertainties. Right now, the state of that system is far from equilibrium.
[Update: These survey results (courtesy of Roger Pielke Sr) are interesting.  They seem to indicate that "climate scientists" view climate prediction as an IVP.
The question is, 16. How would you rate the ability of global climate models to:  (very poor 1  2  3  4  5  6 very good) 16c. model temperature values for the next 10 years 16d. model temperature values for the next 50 years.  The mean response for the longer term prediction is actually lower than for the short term prediction (3.7 vs. 4.2), though the difference isn't that big considering the standard deviation.

The comparing the "reproduce observations" questions (16a. and 16b.) with the "predict future values" (16c. and 16d.) goes to the validation question.  Would you bet your life on your models predictions?]