Wednesday, January 19, 2011

Empiricism and Simulation

There are two orthogonal ideas that seem to get conflated in discussions about climate modeling. One is the idea that you’re not doing science if you can’t do a controlled experiment, but of course we have observational sciences like astronomy. The other is that all this new-fangled computer-based simulation is untrustworthy, usually because “it ain’t the way my grandaddy did science.” Both are rather silly ideas. We can still weigh the evidence for competing models based on observation, and we can still find protection from fooling ourselves even when those models are complex.

What does it mean to be an experimental as opposed to an observational science? Do sensitivity studies, and observational diagnostics using sophisticated simulations count as experiments? Easterbrook claims that because climate scientists do these two things with their models that climate science is an experimental science [1]. It seems like there is a motivation to claim the mantle of experimental, because it may carry more rhetorical credibility than the merely observational (the critic Easterbrook is addressing certainly thinks so). This is probably because the statements we can make about causality and the strength of the inferences we can draw are usually greater when we can run controlled experiments than when we are stuck with whatever natural experiments fortune provisions for us (and there are sound mathematical reasons for this, having to do with optimality in experimental design rather than any label we may place on the source of the data). This seeming motivation demonstrated by Easterbrook to embrace the label of empirical is in sharp contrast to the denigration of the empirical by Tobis in his three part series [234]. As I noted on his site, the narrative Tobis is trying to create with those posts has already been pre-messed with by Easterbrook, his readers just pointed out the obvious weaknesses too. One good thing about blogging is the critical and timely feedback.

The confusions of these two climate warriors are an interesting point of departure. I think they are both saying more than blah blah blah, so it’s worth trying to clarify this issue. The figure below is based on a technical report from Sandia [5], which is a good overview and description of the concepts and definitions for model verification and validation as it has developed in the computational physics community over the past decade or so. I think this emerging body of work on model V&V places the relative parts, experiment and simulation, in a sound framework for decision making and reasoning about what models mean.


Figure 1: Verification and Validation Process (based largely on [5])

The process starts at the top of the flowchart with a “Reality of Interest”, from which a conceptual model is developed. At this point the path splits into two main branches. One based on “Physical Modeling” and the other based on “Mathematical Modeling”. Something I don’t think many people realize is that there is a significant tradition of modeling in science that isn’t based on equations. It is no coincidence that an aeronautical engineer might talk of testing ideas with a wind-tunnel model or a CFD model. Both models are simplifications of the reality of interest, which, for that engineer, is usually a full-scale vehicle in free flight.

Figure 2 is just a look at the V&V process through my Design of Experiments (DoE) colored glasses.


Figure 2: Distorted Verification and Validation Process

My distorted view of the V&V process is shown to emphasize that there’s plenty of room for experimentalists to have fun (maybe even a job [3]) in this, admittedly model-centric, sandbox. However, the transferability of the basic experimental design skills between “Validation Experiments” and “Computational Experiments” says nothing about what category of science one is practicing. The method of developing models may very well be empirical (and I think Professor Easterbrook and I would agree it is, and maybe even should be), but that changes nothing about the source of the data which is used for “Model Validation.”

The computational experiments highlighted in Figure 2 are for correctness checking, but those aren’t the sorts of computational experiments Easterbrook claimed made climate science an experimental science. Where do sensitivity studies and model-based diagnostics fit on the flowchart? I think sensitivity studies fit well in the activity called “Pre-test Calculations”, which, one would hope, inform the design of experimental campaigns. Diagnostics are more complicated.

Heald and Wharton have a good explanation for the use of the term “diagnostic” in their book on microwave-based plasma diagnostics: “The term ‘diagnostics,’ of course, comes from the medical profession. The word was first borrowed by scientists engaged in testing nuclear explosions about 15 years ago [c. 1950] to describe measurements in which they deduced the progress of various physical processes from the observable external symptoms” [6]. With a diagnostic we are using the model to help us generate our “Experimental Data”, so that would happen within the activity of “Experimentation” on this flowchart. This use of models as diagnostic tools is applied to data obtained from either experiment (e.g. laboratory plasma diagnostics) or observations (e.g. astronomy, climate science), so it says nothing about whether a particular science is observational or experimental. Classifying scientific activities as experimental or observational is of passing interest, but I think far too much emphasis is placed on this question for the purpose of winning rhetorical “points.”

The more interesting issue from a V&V perspective is introducing a new connection in the flowchart that shows how a dependency between model and experimental data could exist (Figure 3). Most of the time the diagnostic model, and the model being validated are different. However, this case where they are the same is an interesting and practically relevant one that is not addressed in the current V&V literature that I know of (please share links if you “know of”).


Figure 3: V&V process including model-based diagnostic

It should be noted that even though the same model may be used to make predictions and perform diagnostics, it will usually be run in a different way for those two uses. The significant changes between Figure 1 and Figure 3 are the addition of a “Experimental Diagnostic” box and the change to the mathematical cartoon in the “Validation Experiment” box. The change to the cartoon is to indicate that we can’t measure what we want directly (u), so we have to use a diagnostic model to estimate it based on the things we can measure (b). An example of when the model-based diagnostic is relatively independent of the model being validated might be using laser-based diagnostic for fluid flow. The equations describing propagation of the laser through the fluid are not the same as those describing the flow. An example of when the two codes might be connected would be if you were trying to use ultrasound to diagnose a flow. The diagnostic model and the predictive model could both be Navier-Stokes with turbulence closures. Establishing the validity of which is the aim of the investigation. I’d be interested in criticisms of how I explained this / charted this out.


Attempt at Answering Model Questions

I’m not in the target population that professor Easterbrook is studying, but here’s my attempt at answering his questions about model validation[7].

  1. “If I understand correctly–a model is ’valid’ (is that a formal term?) if the code is written to correctly represent the best theoretical science at the time...”

    I think you are using an STS flavored definition for “valid.” The IEEE/AIAA/ASME/US-DoE/US-DoD definition differs. “Valid” means observables you get out of your simulations are “close enough” to observables in the wild (experimental results). The folks from DoE tend to argue for a broader definition of valid than the DoD folks. They’d like to include as “validation” activities of a scientist comparing simulation results and experimental results without reference to an intended use.

  2. “– so then what do the results tell you? What are you modeling for–or what are the possible results or output of the model?”

    Doing a simulation (running the implementation of a model) makes explicit the knowledge implicit in your modeling choices. The model is just the governing equations, you have to run a simulation to find solutions to those governing equations.

  3. “If the model tells you something you weren’t expecting, does that mean it’s invalid? When would you get a result or output that conflicts with theory and then assess whether the theory needs to be reconsidered?”

    This question doesn’t make sense to me. How could you get a model output that conflicted with theory? The model is based on theory. Maybe this question is about how simplifying assumptions could lead to spurious results? For example, if a simulation result shows failure to conserve mass/momentum/energy in a specific calculation possibly due to a modeling assumption (more likely due to a more mundane error), I don’t think anyone but a perpetual-motion machine nutter would seriously reconsider the conservation laws.

  4. “Then is it the theory and not the model that is the best tool for understanding what will happen in the future? Is the best we can say about what will happen that we have a theory that adheres to what we know about the field and that makes sense based on that knowledge?”

    This one doesn’t make sense to me either. You have a “theory,” but you can’t formulate a “model” of it and run a simulation, or just a pencil and paper calculation? I don’t think I’m understanding how you are using those words.

  5. “What then is the protection or assurance that the theory is accurate? How can one ‘check’ predictions without simply waiting to see if they come true or not come true?”

    There’s no magic; the protection from fooling ourselves is the same as it has always been, only the names of the problems change.

Attempt at Understanding Blah Blah Blah

  • “The trouble comes when empiricism is combined with a hypothesis that the climate is stationary, which is implicit in how many of their analyses work.” [8]

    The irony of this statement is extraordinary in light of all the criticisms by the auditors and others of statistical methods in climate science. It would be a valid criticism, if it were supported.

  • “The empiricist view has never entirely faded from climatology, as, I think, we see from Curry. But it’s essentially useless in examining climate change. Under its precepts, the only thing that is predictable is stasis. Once things start changing, empirical science closes the books and goes home. At that point you need to bring some physics into your reasoning.” [2]

    So we’ve gone from what could be reasonable criticism of unfounded assumptions of stationarity to empiricism being unable to explain or understand dynamics. I guess the guys working on embedding dimension stuff, or analogy based predictions would be interested to know that.

  • “See, empiricism lacks consilience. When the science moves in a particular direction, they have nothing to offer. They can only read their tea leaves. Empiricists live in a world which is all correlation, and no causation.” [3]

    Lets try some definitions.

    knowledge through observation
    unity of knowledge, non-contradiction

    How can the observations contradict each other? Maybe a particular explanation for a set of observations is not consilient with another explanation for a different set of observations. This seems to be something that would get straightened out in short order though: it’s on this frontier that scientific work proceeds. I’m not sure how empiricism is “all correlation.” This is just a bald assertion with no support.

  • “While empiricism is an insufficient model for science, while not everything reduces to statistics, empiricism offers cover for a certain kind of pseudo-scientific denialism. [...] This is Watts Up technique asea; the measurements are uncertain; therefore they might as well not exist; therefore there is no cause for concern!” [4]

    Tobis: Empiricism is an insufficient model for science. Feynman: The test of all knowledge is experiment. Tobis: Not everything reduces to statistics. Jaynes: Probability theory is the logic of science. To be fair, Feynman does go on to say that you need imagination to think up things to test in your experiments, but I’m not sure that isn’t included in empiricism. Maybe it isn’t included in the empiricism Tobis is talking about.

    So that’s what all this is about? You’re upset at Watts making a fallacious argument about uncertainty? What does empiricism have to do with this? It would be simple enough to just point out that uncertainty doesn’t mean ignorance.

Not quite blah blah blah, but the argument is still hardly thought out and poorly supported.


[1]   Easterbrook, S., “Climate Science is an Experimental Science,”, February 2010.

[2]   Tobis, M., “The Empiricist Fallacy,”, November 2010.

[3]   Tobis, M., “Empiricism as a Job,”, November 2010.

[4]   Tobis, M., “Pseudo-Empiricism and Denialism,”, November 2010.

[5]   Thacker, B. H., Doebling, S. W., Hemez, F. M., Anderson, M. C., Pepin, J. E., and Rodriguez, E. A., “Concepts of Model Verification and Validation,” Tech. Rep. LA-14167-MS, Los Alamos National Laboratory, Oct 2004.

[6]   Heald, M. and Wharton, C., Plasma Diagnostics with Microwaves, Wiley series in plasma physics, Wiley, New York, 1965.

[7]   Easterbrook, S., “Validating Climate Models,”, November 2010.

[8]   Tobis, M., “Empiricism,”, November 2010.

Thanks to George Crews and Dan Hughes for their critical feedback on portions of this.

[Update: George left a comment with suggestions on changing the flowchart. Here's my take on his suggested changes.

A slightly modified version of George's chart. I think it makes more sense to have the "No" branch of the validation decision point back at "Abstraction", which parallels the "No" branch of the verification decision pointing at "Implementation". Also switched around "Experimental Data" and "Experimental Diagnostic." Notably absent is any loop for "Calibration"; this would properly be a separate loop with output feeding in to "Computer Model."


  1. Hi Joshua,

    A picture is worth a thousand words! The figures show that modeling is not about finding truth about nature, but the more modest goal of finding "acceptable agreement" between the model and experiment.

    In Figure 1, just as there is a feedback loop from Model Validation back to the Conceptual Model, there is a feedback loop from Model Verification back to the Computer Model. And, again like the Model Validation feedback loop, there will be a decision diamond. Is the computer model acceptably bug free?

    So if this other feedback loop were included, the figure would clearly illustrate that there are two decisions. Is the model verified, and is the model validated? IMHO, these are subjective decisions based on criteria external to the process illustrated in the figure. Is the code bug free enough and is the model accurate enough to risk using for its intended purpose?

    I think it safe to say that the risk assessments of many people for the current GCMs are telling them -- for the purpose of an activist climate policy -- yes. Others say -- no. That there is a split along political lines is not surprising. Policy risk assessment criteria tend to split along party lines.

    (But it is actually an advantage that the modeling process is independent of the decision criteria. We may be able to establish a consensus modeling process. Regardless of political outlook.)

    Figure 3 and its Experimental Diagnostic box helps me better understand what Easterbrook may be referring to when describing the GCMs as experimental tools. Thanks.

    With that box in mind, rather than the term "Computational Experiments" in the Model Verification box, I would use the term "Computational Diagnostics." When I debug complex code, I take an "experimental" approach in that I try and predict what a change in the code or input will produce, but my goal is to make a diagnosis about a bug.

    I certainly hope that Michael and Steve comment on this post.


  2. Thanks for the feedback George; I added a new chart reflecting those suggestions. Of course there would also be a decision point on the experimental side to decide how much testing is sufficient; maybe I'll flesh that out a bit more in a future post.

    gmcrews said: But it is actually an advantage that the modeling process is independent of the decision criteria. We may be able to establish a consensus modeling process. Regardless of political outlook.

    Good point; that's why I've focused more on verification in my comments rather than validation; more likely to be able to have a fruitful discussion without getting too political.

    I like the term "Computational Diagnostics" to describe verification activities. The "Computational Experiments" really happen off the "Yes" fork of the verification decision.

  3. George, I think you'll find this interesting (I've not heard of this effort before): CASoS Engineering Applications: Climate Security Impacts; they have an interesting-sounding tech report up (I've not had time to read it yet): Uncertainty Quantification and Validation of Combined Hydrological and Macroeconomic Analyses

  4. Joshua, I hadn't noticed this blog before. It looks to be immensely interesting.

    My "empiricism" gripes don't seem directly relevant to the rest of this piece, so I'm not sure how easily I can clarify my point. I'm pretty sure I haven't seen Steve make this point because it's not really on his turf.

    My complaint is with people who judge the "truth" or "falsehood" of the "global warming theory" based on hairsplitting of the observations rather than on physical understanding of the principles.

    All sorts of statistical hairsplitting occurs, mostly focused on the one dimensional record of global mean surface temperature vs time.

    This technique trivializes the system and the knowledge of the system, bashing the vast majority of the information out of the theory, the observations and the predictions. This would be quite perfectly silly if it weren't consequential. It also would not exist were it not consequential.

    It's related to what I would consider a pre-scientific thread in climatological culture, one which has been singularly reluctant to go along with climate change arguments by comparison with the other major subgroups. Their empiricism is not so one-dimensional but it remains pure; a discovery of correlations, especially lagged correlations, is all there is to it. It's statistics in the sense that a baseball fan uses the word.

    These complaints have, really, nothing to do with your critiques of the modeling enterprise, which I look forward to further investigating.

  5. I guess I'm not familiar enough with the history. Are you talking about the folks trying to do analogy based forecasting, or some sort of dimension embedding? Can you point me at some exemplars of this sub-group you're talking about (what I might call 'unimaginiative empiricists')? The statistical hair-splitters playing with GMST I'm less interested in.

    I think your complaints are related to this topic, but it may become obvious that they're not if you clarify a bit.

    Thanks for commenting.

  6. Hi Joshua,

    From reference [8] Tobis states : The basic idea of empiricism ... is that the data "speak for themselves" and no context is necessary. ... The empiricist view has never entirely faded from climatology .... But it's essentially useless in examining climate change. Under its precepts, the only thing that is predictable is stasis. Once things start changing, empirical science closes the books and goes home. At that point you need to bring some physics into your reasoning.

    Of course the data never speaks for itself. Theory is an integral part of the scientific method. However, his use of the word "empiricism" is entirely incompatible with its use in the Wikipedia which states: Empiricism then, in the philosophy of science, emphasizes those aspects of scientific knowledge that are closely related to evidence, especially as discovered in experiments. It is a fundamental part of the scientific method that all hypotheses and theories must be tested against observations of the natural world, rather than resting solely on a priori reasoning, intuition, or revelation. Hence, science is considered to be methodologically empirical in nature.

    Tobis even puts "empiricism" in scare quotes in his comment above. So I understand and agree with Tobis that his gripes with empiricism are not meant to denigrate another fundamental aspect of the scientific method -- empiricism. IMHO, he is using the word empiricism because of a lack of a better word to identify his gripes with.

    But then what *is* he talking about? I'm not sure. Perhaps it is the reluctance people have to change their beliefs. Tobis mentions "stasis." People will be reluctant to adopt a new scientific paradigm if it conflicts with their strongly held views on the nature of reality. They will go to great lengths to try and interpret experimental data in a manner compatible with their established beliefs.

    A canonical historical example would be the reluctance society had in replacing the Ptolemaic view of the universe with the Copernican system. Tobis may be griping that more people need to commit to the view that anthropogenic CO2 is at the center of the climate universe. So to speak.

    However, if this is anywhere near what Tobis was getting at, I would remind him that since Copernican orbits were all perfect circles, the geocentric model's deferents and epicycles produced predictions at least as, if not more, accurate than Copernicus. Even Kepler's ellipses and resulting dramatic improvement in prediction accuracy was not enough. It was not till Galileo's phases of Venus telescope observations that the Ptolemaic view became untenable.

    Note what it took to make the paradigm shift. Until such a time as an observation is made that is *not* compatible with natural climate variability, reluctance to adopt a more anthropogenic CO2 view will remain tenable.

  7. I am not talking about anybody you need to take seriously from the point of view of mathematical science, if that is your question. That's why I'm surprised to see my complaint at a place like this one.

    In retrospect, I am sure I should not have chosen the word "empiricism" for the error. Perhaps "heuristicism" would have been better. There is a community of people that have been doing "climate forecasts" (one to six month outlooks) for some considerable time. This is based entirely on heuristics. Bill Gray's hurricane forecasts are a good example.

    When climate change accelerates, the heuristics increasingly give bad guidance. The heuristics crowd has no way of anticipating any changes and relies on purely observational techniques. It is difficult to tease out anything from observations without a physical model and impossible without a statistical model other than the heuristics that they already have. This community seems to be the segment of climate science (if you grant them that much) that is most likely to find fault with the WG I viewpoint.

    This fits in with what gmcrews says: "Until such a time as an observation is made that is *not* compatible with natural climate variability, reluctance to adopt a more anthropogenic CO2 view will remain tenable."

    I call this "rolling a 14", an event so far out of line with previous experience that the idea that what we are seeing is within the bounds of normal becomes transparently wrong. If we get a couple more years like 2010 and early 2011 have turned out, that may be enough to convince most scientists that the idea of a quasistationary climate is no longer useful. What it will take to convince the world at large is hard to constrain.

  8. Hi Michael,

    Thanks for the insight. I don't know what it will take to convince people either. But for myself, I need to know what the probabilities are of "rolling a 14." I need numbers between 0 and 1. Bayesian probabilities. Otherwise, I can't have rational climate policy beliefs.

    To get such numbers, and all other numbers of their ilk, I see no alternative to the GCMs being consensus SQA verified and validated (see above figures) for the following usages: What are the boundaries of natural climate variability? What are the boundaries of unnatural (i.e., anthropogenic CO2) climate variability?

    BTW, I understand now and share your gripe with the heuristics crowd. Their heuristics will be put to the same usages as above, so I make the same V&V requests to them. Using your analogy, I don't care if heuristics can predict rolling a 2 through 12. How about predicting a 14? How about the boundaries?

    However, my own gripe also extends to the GCM crowd.