Wednesday, January 19, 2011

Empiricism and Simulation

There are two orthogonal ideas that seem to get conflated in discussions about climate modeling. One is the idea that you’re not doing science if you can’t do a controlled experiment, but of course we have observational sciences like astronomy. The other is that all this new-fangled computer-based simulation is untrustworthy, usually because “it ain’t the way my grandaddy did science.” Both are rather silly ideas. We can still weigh the evidence for competing models based on observation, and we can still find protection from fooling ourselves even when those models are complex.

What does it mean to be an experimental as opposed to an observational science? Do sensitivity studies, and observational diagnostics using sophisticated simulations count as experiments? Easterbrook claims that because climate scientists do these two things with their models that climate science is an experimental science [1]. It seems like there is a motivation to claim the mantle of experimental, because it may carry more rhetorical credibility than the merely observational (the critic Easterbrook is addressing certainly thinks so). This is probably because the statements we can make about causality and the strength of the inferences we can draw are usually greater when we can run controlled experiments than when we are stuck with whatever natural experiments fortune provisions for us (and there are sound mathematical reasons for this, having to do with optimality in experimental design rather than any label we may place on the source of the data). This seeming motivation demonstrated by Easterbrook to embrace the label of empirical is in sharp contrast to the denigration of the empirical by Tobis in his three part series [234]. As I noted on his site, the narrative Tobis is trying to create with those posts has already been pre-messed with by Easterbrook, his readers just pointed out the obvious weaknesses too. One good thing about blogging is the critical and timely feedback.

The confusions of these two climate warriors are an interesting point of departure. I think they are both saying more than blah blah blah, so it’s worth trying to clarify this issue. The figure below is based on a technical report from Sandia [5], which is a good overview and description of the concepts and definitions for model verification and validation as it has developed in the computational physics community over the past decade or so. I think this emerging body of work on model V&V places the relative parts, experiment and simulation, in a sound framework for decision making and reasoning about what models mean.


Figure 1: Verification and Validation Process (based largely on [5])

The process starts at the top of the flowchart with a “Reality of Interest”, from which a conceptual model is developed. At this point the path splits into two main branches. One based on “Physical Modeling” and the other based on “Mathematical Modeling”. Something I don’t think many people realize is that there is a significant tradition of modeling in science that isn’t based on equations. It is no coincidence that an aeronautical engineer might talk of testing ideas with a wind-tunnel model or a CFD model. Both models are simplifications of the reality of interest, which, for that engineer, is usually a full-scale vehicle in free flight.

Figure 2 is just a look at the V&V process through my Design of Experiments (DoE) colored glasses.


Figure 2: Distorted Verification and Validation Process

My distorted view of the V&V process is shown to emphasize that there’s plenty of room for experimentalists to have fun (maybe even a job [3]) in this, admittedly model-centric, sandbox. However, the transferability of the basic experimental design skills between “Validation Experiments” and “Computational Experiments” says nothing about what category of science one is practicing. The method of developing models may very well be empirical (and I think Professor Easterbrook and I would agree it is, and maybe even should be), but that changes nothing about the source of the data which is used for “Model Validation.”

The computational experiments highlighted in Figure 2 are for correctness checking, but those aren’t the sorts of computational experiments Easterbrook claimed made climate science an experimental science. Where do sensitivity studies and model-based diagnostics fit on the flowchart? I think sensitivity studies fit well in the activity called “Pre-test Calculations”, which, one would hope, inform the design of experimental campaigns. Diagnostics are more complicated.

Heald and Wharton have a good explanation for the use of the term “diagnostic” in their book on microwave-based plasma diagnostics: “The term ‘diagnostics,’ of course, comes from the medical profession. The word was first borrowed by scientists engaged in testing nuclear explosions about 15 years ago [c. 1950] to describe measurements in which they deduced the progress of various physical processes from the observable external symptoms” [6]. With a diagnostic we are using the model to help us generate our “Experimental Data”, so that would happen within the activity of “Experimentation” on this flowchart. This use of models as diagnostic tools is applied to data obtained from either experiment (e.g. laboratory plasma diagnostics) or observations (e.g. astronomy, climate science), so it says nothing about whether a particular science is observational or experimental. Classifying scientific activities as experimental or observational is of passing interest, but I think far too much emphasis is placed on this question for the purpose of winning rhetorical “points.”

The more interesting issue from a V&V perspective is introducing a new connection in the flowchart that shows how a dependency between model and experimental data could exist (Figure 3). Most of the time the diagnostic model, and the model being validated are different. However, this case where they are the same is an interesting and practically relevant one that is not addressed in the current V&V literature that I know of (please share links if you “know of”).


Figure 3: V&V process including model-based diagnostic

It should be noted that even though the same model may be used to make predictions and perform diagnostics, it will usually be run in a different way for those two uses. The significant changes between Figure 1 and Figure 3 are the addition of a “Experimental Diagnostic” box and the change to the mathematical cartoon in the “Validation Experiment” box. The change to the cartoon is to indicate that we can’t measure what we want directly (u), so we have to use a diagnostic model to estimate it based on the things we can measure (b). An example of when the model-based diagnostic is relatively independent of the model being validated might be using laser-based diagnostic for fluid flow. The equations describing propagation of the laser through the fluid are not the same as those describing the flow. An example of when the two codes might be connected would be if you were trying to use ultrasound to diagnose a flow. The diagnostic model and the predictive model could both be Navier-Stokes with turbulence closures. Establishing the validity of which is the aim of the investigation. I’d be interested in criticisms of how I explained this / charted this out.


Attempt at Answering Model Questions

I’m not in the target population that professor Easterbrook is studying, but here’s my attempt at answering his questions about model validation[7].

  1. “If I understand correctly–a model is ’valid’ (is that a formal term?) if the code is written to correctly represent the best theoretical science at the time...”

    I think you are using an STS flavored definition for “valid.” The IEEE/AIAA/ASME/US-DoE/US-DoD definition differs. “Valid” means observables you get out of your simulations are “close enough” to observables in the wild (experimental results). The folks from DoE tend to argue for a broader definition of valid than the DoD folks. They’d like to include as “validation” activities of a scientist comparing simulation results and experimental results without reference to an intended use.

  2. “– so then what do the results tell you? What are you modeling for–or what are the possible results or output of the model?”

    Doing a simulation (running the implementation of a model) makes explicit the knowledge implicit in your modeling choices. The model is just the governing equations, you have to run a simulation to find solutions to those governing equations.

  3. “If the model tells you something you weren’t expecting, does that mean it’s invalid? When would you get a result or output that conflicts with theory and then assess whether the theory needs to be reconsidered?”

    This question doesn’t make sense to me. How could you get a model output that conflicted with theory? The model is based on theory. Maybe this question is about how simplifying assumptions could lead to spurious results? For example, if a simulation result shows failure to conserve mass/momentum/energy in a specific calculation possibly due to a modeling assumption (more likely due to a more mundane error), I don’t think anyone but a perpetual-motion machine nutter would seriously reconsider the conservation laws.

  4. “Then is it the theory and not the model that is the best tool for understanding what will happen in the future? Is the best we can say about what will happen that we have a theory that adheres to what we know about the field and that makes sense based on that knowledge?”

    This one doesn’t make sense to me either. You have a “theory,” but you can’t formulate a “model” of it and run a simulation, or just a pencil and paper calculation? I don’t think I’m understanding how you are using those words.

  5. “What then is the protection or assurance that the theory is accurate? How can one ‘check’ predictions without simply waiting to see if they come true or not come true?”

    There’s no magic; the protection from fooling ourselves is the same as it has always been, only the names of the problems change.

Attempt at Understanding Blah Blah Blah

  • “The trouble comes when empiricism is combined with a hypothesis that the climate is stationary, which is implicit in how many of their analyses work.” [8]

    The irony of this statement is extraordinary in light of all the criticisms by the auditors and others of statistical methods in climate science. It would be a valid criticism, if it were supported.

  • “The empiricist view has never entirely faded from climatology, as, I think, we see from Curry. But it’s essentially useless in examining climate change. Under its precepts, the only thing that is predictable is stasis. Once things start changing, empirical science closes the books and goes home. At that point you need to bring some physics into your reasoning.” [2]

    So we’ve gone from what could be reasonable criticism of unfounded assumptions of stationarity to empiricism being unable to explain or understand dynamics. I guess the guys working on embedding dimension stuff, or analogy based predictions would be interested to know that.

  • “See, empiricism lacks consilience. When the science moves in a particular direction, they have nothing to offer. They can only read their tea leaves. Empiricists live in a world which is all correlation, and no causation.” [3]

    Lets try some definitions.

    knowledge through observation
    unity of knowledge, non-contradiction

    How can the observations contradict each other? Maybe a particular explanation for a set of observations is not consilient with another explanation for a different set of observations. This seems to be something that would get straightened out in short order though: it’s on this frontier that scientific work proceeds. I’m not sure how empiricism is “all correlation.” This is just a bald assertion with no support.

  • “While empiricism is an insufficient model for science, while not everything reduces to statistics, empiricism offers cover for a certain kind of pseudo-scientific denialism. [...] This is Watts Up technique asea; the measurements are uncertain; therefore they might as well not exist; therefore there is no cause for concern!” [4]

    Tobis: Empiricism is an insufficient model for science. Feynman: The test of all knowledge is experiment. Tobis: Not everything reduces to statistics. Jaynes: Probability theory is the logic of science. To be fair, Feynman does go on to say that you need imagination to think up things to test in your experiments, but I’m not sure that isn’t included in empiricism. Maybe it isn’t included in the empiricism Tobis is talking about.

    So that’s what all this is about? You’re upset at Watts making a fallacious argument about uncertainty? What does empiricism have to do with this? It would be simple enough to just point out that uncertainty doesn’t mean ignorance.

Not quite blah blah blah, but the argument is still hardly thought out and poorly supported.


[1]   Easterbrook, S., “Climate Science is an Experimental Science,”, February 2010.

[2]   Tobis, M., “The Empiricist Fallacy,”, November 2010.

[3]   Tobis, M., “Empiricism as a Job,”, November 2010.

[4]   Tobis, M., “Pseudo-Empiricism and Denialism,”, November 2010.

[5]   Thacker, B. H., Doebling, S. W., Hemez, F. M., Anderson, M. C., Pepin, J. E., and Rodriguez, E. A., “Concepts of Model Verification and Validation,” Tech. Rep. LA-14167-MS, Los Alamos National Laboratory, Oct 2004.

[6]   Heald, M. and Wharton, C., Plasma Diagnostics with Microwaves, Wiley series in plasma physics, Wiley, New York, 1965.

[7]   Easterbrook, S., “Validating Climate Models,”, November 2010.

[8]   Tobis, M., “Empiricism,”, November 2010.

Thanks to George Crews and Dan Hughes for their critical feedback on portions of this.

[Update: George left a comment with suggestions on changing the flowchart. Here's my take on his suggested changes.

A slightly modified version of George's chart. I think it makes more sense to have the "No" branch of the validation decision point back at "Abstraction", which parallels the "No" branch of the verification decision pointing at "Implementation". Also switched around "Experimental Data" and "Experimental Diagnostic." Notably absent is any loop for "Calibration"; this would properly be a separate loop with output feeding in to "Computer Model."


  1. Thanks for the feedback George; I added a new chart reflecting those suggestions. Of course there would also be a decision point on the experimental side to decide how much testing is sufficient; maybe I'll flesh that out a bit more in a future post.

    gmcrews said: But it is actually an advantage that the modeling process is independent of the decision criteria. We may be able to establish a consensus modeling process. Regardless of political outlook.

    Good point; that's why I've focused more on verification in my comments rather than validation; more likely to be able to have a fruitful discussion without getting too political.

    I like the term "Computational Diagnostics" to describe verification activities. The "Computational Experiments" really happen off the "Yes" fork of the verification decision.

  2. George, I think you'll find this interesting (I've not heard of this effort before): CASoS Engineering Applications: Climate Security Impacts; they have an interesting-sounding tech report up (I've not had time to read it yet): Uncertainty Quantification and Validation of Combined Hydrological and Macroeconomic Analyses

  3. Joshua, I hadn't noticed this blog before. It looks to be immensely interesting.

    My "empiricism" gripes don't seem directly relevant to the rest of this piece, so I'm not sure how easily I can clarify my point. I'm pretty sure I haven't seen Steve make this point because it's not really on his turf.

    My complaint is with people who judge the "truth" or "falsehood" of the "global warming theory" based on hairsplitting of the observations rather than on physical understanding of the principles.

    All sorts of statistical hairsplitting occurs, mostly focused on the one dimensional record of global mean surface temperature vs time.

    This technique trivializes the system and the knowledge of the system, bashing the vast majority of the information out of the theory, the observations and the predictions. This would be quite perfectly silly if it weren't consequential. It also would not exist were it not consequential.

    It's related to what I would consider a pre-scientific thread in climatological culture, one which has been singularly reluctant to go along with climate change arguments by comparison with the other major subgroups. Their empiricism is not so one-dimensional but it remains pure; a discovery of correlations, especially lagged correlations, is all there is to it. It's statistics in the sense that a baseball fan uses the word.

    These complaints have, really, nothing to do with your critiques of the modeling enterprise, which I look forward to further investigating.

  4. I guess I'm not familiar enough with the history. Are you talking about the folks trying to do analogy based forecasting, or some sort of dimension embedding? Can you point me at some exemplars of this sub-group you're talking about (what I might call 'unimaginiative empiricists')? The statistical hair-splitters playing with GMST I'm less interested in.

    I think your complaints are related to this topic, but it may become obvious that they're not if you clarify a bit.

    Thanks for commenting.

  5. I am not talking about anybody you need to take seriously from the point of view of mathematical science, if that is your question. That's why I'm surprised to see my complaint at a place like this one.

    In retrospect, I am sure I should not have chosen the word "empiricism" for the error. Perhaps "heuristicism" would have been better. There is a community of people that have been doing "climate forecasts" (one to six month outlooks) for some considerable time. This is based entirely on heuristics. Bill Gray's hurricane forecasts are a good example.

    When climate change accelerates, the heuristics increasingly give bad guidance. The heuristics crowd has no way of anticipating any changes and relies on purely observational techniques. It is difficult to tease out anything from observations without a physical model and impossible without a statistical model other than the heuristics that they already have. This community seems to be the segment of climate science (if you grant them that much) that is most likely to find fault with the WG I viewpoint.

    This fits in with what gmcrews says: "Until such a time as an observation is made that is *not* compatible with natural climate variability, reluctance to adopt a more anthropogenic CO2 view will remain tenable."

    I call this "rolling a 14", an event so far out of line with previous experience that the idea that what we are seeing is within the bounds of normal becomes transparently wrong. If we get a couple more years like 2010 and early 2011 have turned out, that may be enough to convince most scientists that the idea of a quasistationary climate is no longer useful. What it will take to convince the world at large is hard to constrain.