Comments on Various Consequences: Bayesian Climate Model Averaging

In Quantifying Uncertainty in Projections of Regio...

2010-04-10T15:29:53.200-04:00

In Quantifying Uncertainty in Projections of Regional Climate Change: A Bayesian Approach to the Analysis of Multi-model Ensembles they introduce fat-tail (Student's t) distributions to 'robustify' (really, that's the term they use) their modeling (similar to the approach in this set of slides).

I like this part of their conclusion:
In contrast, we think that the Bayesian approach is not only flexible but facilitates an open debate on the assumptions that generate probabilistic forecasts.
Making the assumptions explicit is a big step towards productive discussion and consensus building.

James Annan has a short comment [pdf] in press, he...

2010-03-14T14:25:25.008-04:00

James Annan has a short comment [pdf] in press, here's an interesting paragraph:
Min and Hense [2006] suggest another alternative to the reporting of probabilities, explicitly treating the issue as a decision problem in which the expected loss is to be minimised and thus emphasising the close link between Bayesian probability and decision theory. The companion paper Min and Hense [2007] considers the issue of D&A on a regional and seasonal basis. Uncertainties are relatively higher at smaller scales, and moreover it is on a local basis that climate change will actually impact the environment. Therefore, this area of research is likely to remain important long after the main questions of climate change on the global scale are considered settled.

Decision theory is the way to go; vague arguments for action based on hand-wavy applications of the precautionary principle are sub-optimal and generally incoherent.

A Bayesian Framework for Multimodel Regression Abs...

2010-02-17T13:22:31.037-05:00

A Bayesian Framework for Multimodel Regression
Abstract:
This paper presents a framework based on Bayesian regression and constrained least squares methods for incorporating prior beliefs in a linear regression problem. Prior beliefs are essential in regression theory when the number of predictors is not a small fraction of the sample size, a situation that leads to overfit- ting—that is, to fitting variability due to sampling errors. Under suitable assumptions, both the Bayesian estimate and the constrained least squares solution reduce to standard ridge regression. New generalizations of ridge regression based on priors relevant to multimodel combinations also are presented. In all cases, the strength of the prior is measured by a parameter called the ridge parameter. A “two-deep” cross-validation procedure is used to select the optimal ridge parameter and estimate the prediction error.

The proposed regression estimates are tested on the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction (DEMETER) hindcasts of seasonal mean 2-m temperature over land. Surprisingly, none of the regression models proposed here can consistently beat the skill of a simple multimodel mean, despite the fact that one of the regression models recovers the multimodel mean in a suitable limit. This discrepancy arises from the fact that methods employed to select the ridge parameter are themselves sensitive to sampling errors. It is plausible that incorporating the prior belief that regression parameters are “large scale” can reduce overfitting and result in improved performance relative to the multimodel mean. Despite this, results from the multimodel mean demonstrate that seasonal mean 2-m temperature is predictable for at least three months in several regions.

So, not really a win for Bayes Model Averaging (BMA) for climate prediction in this one. It is a good example of how even climate prediction can still be an IVP-type problem, which depends on the accuracy of the initialization.

It is also a good illustration of Jaynes' claim that properly applied probability theory (Bayesian) does away with the multitude of ad-hoceries in the standard statistical toolbox:
The purpose of this paper is to clarify the fact that a wide variety of methods for reducing overfitting in linear regression problems, including many of those mentioned above, can be interpreted in a single Bayesian framework. Bayesian theory allows one to incorporate “prior knowledge” in the estimation process.

Cross-validation and proper physical interpretatio...

2009-12-25T19:00:10.420-05:00

Cross-validation and proper physical interpretation of a complex hierarchical model / inference is hard.

Inferring Climate System Properties Using a Computer Model

Comment on article by Sanso et al.:
"But GCM natural variability is a property of the GCM: it does not proxy the difference between the GCM and the climate system. In climate science this has been appreciated and discussed, but only recently has there been a genuine effort to determine a variance for the model structural error that is not based on internal variability (Murphy et al. 2007)."

"Sanso et al. present us with diagnostics based on holding-out 43 of the 426 evaluations from Y , and then predicting the model response on the hold-out and comparing it with the actual values. [...] Sanso et al. present us with diagnostics based on holding-out 43 of the 426 evalua- o tions from Y , and then predicting the model response on the hold-out and comparing it with the actual values. However, I suspect that there is plenty of information about MIT2DCM from the 383 evaluations that remain in Y. The experimental design for Y was a multi-level grid, so the evaluations that remain will almost certainly still do a good job of spanning the three-dimensional model-parameter space. Therefore I am not surprised that the diagnostics show that the hold-out sample is predicted well, but I am not sure that this tells us much about the statistical model for θ∗ , W, Y, z , or the reliability of Sanso et al.’s conclusions about the updated distribution for θ∗ : the verdict on the statistical model from the evidence in the paper is ‘unproven’."

"I particularly commend the use of a statistical model to link model evaluations, model parameters, and system observations. This, and the inclusion of an explicit term for model structural error, are major steps forward for Climate Science."

But really, how large and dynamic is the systemati...

2009-12-25T16:11:03.452-05:00

But really, how large and dynamic is the systematic error? Believing it large or small seems here a matter of faith -- a prior.

Here's a paper that treats the bias problem that way (from the abstract):
"[...] In addition, unlike previous studies, our methodology explicitly considers model biases that are allowed to be time-dependent (i.e. change between control and scenario period). More specifically, the model considers additive and multiplicative model biases for each RCM and introduces two plausible assumptions (‘‘constant bias’’ and ‘‘constant relationship’’) about extrapolating the biases from the control to the scenario period. The resulting identifiability problem is resolved by using informative priors for the bias changes. A sensitivity analysis illustrates the role of the informative prior. [...] Our results show the necessity to consider potential bias changes when projecting climate under an emission scenario. Further work is needed to determine how bias information can be exploited for this task."

By assuming the absence of significant systematic ...

2009-12-25T15:43:41.188-05:00

By assuming the absence of significant systematic error are we not, in effect, assuming validation?

That's my concern.

... how large and dynamic? Believing it large or small seems here a matter of faith...

Not quite, this paper that I linked in the post seems to indicate that the systematic bias is significant. Unfortunately since we can't (or are too impatient to) do validation testing for climate models like we normally would for numerical weather prediction, or CFD, or [pick your simulation], we can't estimate the sign or magnitude of the bias (and then of course control for it in our new and improved model).

...since Bayesian priors can be anything...

I think that's why they chose uninformative priors, they want to avoid criticism that they are 'cooking the books'.

...the climate modelers basically behave as a "herd"...all used the same historical climate data...

That was one of the validation problems identified in that Reichler and Kim 2007 paper.

...a basic consistency among the various models will only strengthen this prior. Not weaken it.

What am I missing here?

I don't think you are missing anything; the cross-validation approach still doesn't protect us from fooling ourselves the way real empirical validation would (it's an unfortunate similarity of terminology too because the two 'validations' are not the same thing at all).

Also, that 'herd' behaviour and 'spread-skill' relationship (less spread means better predictions, more spread means worse predictions) is exhibited by the weather prediction ensembles, but the model that tends to perform well on the training set changes as the training set moves forward in time (and the optimal length of the training set changes based on the thing you are trying to forecast and how far you are trying to forecast), the reason the BMA approach works well there is because we have a chance to close the loop with new observations every day (and gradually change the weights we give to each model).

I think it's still applicable to climate model forecasting, but I don't think we have the political will to do validation because we have to wait much longer to close the loop. Unfortunately, calling for a decade or two of climate forecast validation, and tying policy decisions to gradual changes over decades isn't exactly compatible with urgent calls to decisive action (even if it is compatible with rational decision making, I mean we're talking about a process with time-scales on the order of decades, centuries and millennia right?).

"There are of course some limitations to what...

2009-12-24T11:42:59.279-05:00

"There are of course some limitations to what these procedures can achieve. Although the different climate modeling groups are independent in the sense that they consist of disjoint groups of people, each developing their own computer code, all the GCMs are based on similar physical assumptions and if there were systematic errors affecting future projections in all the GCMs, our procedures could not detect that. On the other hand, another argument sometimes raised by so-called climate skeptics is that disagreements among existing GCMs are sufficient reason to doubt the correctness of any of their conclusions. The methods presented in this paper provide some counter to that argument, because we have shown that by making reasonable statistical assumptions, we can calculate a posterior density that captures the variability among all the models, but that still results in posterior-predictive intervals that are narrow enough to draw meaningful conclusions about probabilities of future climate change."

From Bayesian modeling of uncertainty in ensembles of climate models

In other words, the predictive distributions are informative (it's not just a uniform distribution), but we still can't protect ourselves from systematic bias (which the results cited in the post above seem to indicate). This unquantified risk to decision making is the fundamental problem that lack of model validation admits.

"A difficulty with this kind of Bayesian anal...

2009-12-24T10:36:16.288-05:00

"A difficulty with this kind of Bayesian analysis is how to validate the statistical assumptions. Of course, direct validation based on future climate is impossible. However the following alternative viewpoint is feasible: if we think of the given climate models as a random sample from the universe of possible climate models, we can ask ourselves how well the statistical approach would do in predicting the response of a new climate model. This leads to a cross-validation approach. In effect, this makes an assumption of exchangability among the available climate models."

From Bayesian modeling of uncertainty in ensembles of climate models

This approach is similar to what Jaynes suggested for the treatment of outliers.

Bayesian modeling of uncertainty in ensembles of c...

2009-12-24T09:41:12.339-05:00

Bayesian modeling of uncertainty in ensembles of climate models

Abstract:
Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be combined into a probability distribution of future climate change. For this analysis, we have collected both current and future projected mean temperatures produced by nine climate models for 22 regions of the earth. We also have estimates of current mean temperatures from actual observations, together with standard errors, that can be used to calibrate the climate models. We propose a Bayesian analysis that allows us to combine the different climate models into a posterior distribution of future temperature increase, for each of the 22 regions, while allowing for the different climate models to have different variances. Two versions of the analysis are proposed, a univariate analysis in which each region is analyzed separately, and a multivariate analysis in which the 22 regions are combined into an overall statistical model. A cross-validation approach is proposed to confirm the reasonableness of our Bayesian predictive distributions. The results of this analysis allow for a quantification of the uncertainty of climate model projections as a Bayesian posterior distribution, substantially extending previous approaches to uncertainty in climate models.

The R code and data used in this paper is publicly available.

Using Bayesian model averaging to calibrate foreca...

2009-12-23T19:27:45.896-05:00

Using Bayesian model averaging to calibrate forecast ensembles

First paragraph of the abstract:
Ensembles used for probabilistic weather forecasting often exhibit a spread-skill relationship, but they tend to be underdispersive. This paper proposes a principled statistical method for postprocessing ensembles based on Bayesian model averaging (BMA), which is a standard method for combining predictive distributions from different sources. The BMA predictive probability density function (PDF) of any quantity of interest is a weighted average of PDFs centered around the individual (possibly bias-corrected) forecasts, where the weights are equal to posterior probabilities of the models generating the forecasts, and reflect the models’ skill over the training period. The BMA PDF can be represented as an unweighted ensemble of any desired size, by simulating from the BMA predictive distribution. The BMA weights can be used to assess the usefulness of ensemble members, and this can be used as a basis for selecting ensemble members; this can be useful given the cost of running large ensembles.