There are two orthogonal ideas that seem to get conflated in discussions about climate modeling. One is the idea that you’re not doing science if you can’t do a controlled experiment, but of course we have observational sciences like astronomy. The other is that all this new-fangled computer-based simulation is untrustworthy, usually because “it ain’t the way my grandaddy did science.” Both are rather silly ideas. We can still weigh the evidence for competing models based on observation, and we can still find protection from fooling ourselves even when those models are complex.
What does it mean to be an experimental as opposed to an observational science? Do sensitivity studies, and observational diagnostics using sophisticated simulations count as experiments? Easterbrook claims that because climate scientists do these two things with their models that climate science is an experimental science . It seems like there is a motivation to claim the mantle of experimental, because it may carry more rhetorical credibility than the merely observational (the critic Easterbrook is addressing certainly thinks so). This is probably because the statements we can make about causality and the strength of the inferences we can draw are usually greater when we can run controlled experiments than when we are stuck with whatever natural experiments fortune provisions for us (and there are sound mathematical reasons for this, having to do with optimality in experimental design rather than any label we may place on the source of the data). This seeming motivation demonstrated by Easterbrook to embrace the label of empirical is in sharp contrast to the denigration of the empirical by Tobis in his three part series [2, 3, 4]. As I noted on his site, the narrative Tobis is trying to create with those posts has already been pre-messed with by Easterbrook, his readers just pointed out the obvious weaknesses too. One good thing about blogging is the critical and timely feedback.
The confusions of these two climate warriors are an interesting point of departure. I think they are both saying more than blah blah blah, so it’s worth trying to clarify this issue. The figure below is based on a technical report from Sandia , which is a good overview and description of the concepts and definitions for model verification and validation as it has developed in the computational physics community over the past decade or so. I think this emerging body of work on model V&V places the relative parts, experiment and simulation, in a sound framework for decision making and reasoning about what models mean.
The process starts at the top of the flowchart with a “Reality of Interest”, from which a conceptual model is developed. At this point the path splits into two main branches. One based on “Physical Modeling” and the other based on “Mathematical Modeling”. Something I don’t think many people realize is that there is a significant tradition of modeling in science that isn’t based on equations. It is no coincidence that an aeronautical engineer might talk of testing ideas with a wind-tunnel model or a CFD model. Both models are simplifications of the reality of interest, which, for that engineer, is usually a full-scale vehicle in free flight.
Figure 2 is just a look at the V&V process through my Design of Experiments (DoE) colored glasses.
My distorted view of the V&V process is shown to emphasize that there’s plenty of room for experimentalists to have fun (maybe even a job ) in this, admittedly model-centric, sandbox. However, the transferability of the basic experimental design skills between “Validation Experiments” and “Computational Experiments” says nothing about what category of science one is practicing. The method of developing models may very well be empirical (and I think Professor Easterbrook and I would agree it is, and maybe even should be), but that changes nothing about the source of the data which is used for “Model Validation.”
The computational experiments highlighted in Figure 2 are for correctness checking, but those aren’t the sorts of computational experiments Easterbrook claimed made climate science an experimental science. Where do sensitivity studies and model-based diagnostics fit on the flowchart? I think sensitivity studies fit well in the activity called “Pre-test Calculations”, which, one would hope, inform the design of experimental campaigns. Diagnostics are more complicated.
Heald and Wharton have a good explanation for the use of the term “diagnostic” in their book on microwave-based plasma diagnostics: “The term ‘diagnostics,’ of course, comes from the medical profession. The word was first borrowed by scientists engaged in testing nuclear explosions about 15 years ago [c. 1950] to describe measurements in which they deduced the progress of various physical processes from the observable external symptoms” . With a diagnostic we are using the model to help us generate our “Experimental Data”, so that would happen within the activity of “Experimentation” on this flowchart. This use of models as diagnostic tools is applied to data obtained from either experiment (e.g. laboratory plasma diagnostics) or observations (e.g. astronomy, climate science), so it says nothing about whether a particular science is observational or experimental. Classifying scientific activities as experimental or observational is of passing interest, but I think far too much emphasis is placed on this question for the purpose of winning rhetorical “points.”
The more interesting issue from a V&V perspective is introducing a new connection in the flowchart that shows how a dependency between model and experimental data could exist (Figure 3). Most of the time the diagnostic model, and the model being validated are different. However, this case where they are the same is an interesting and practically relevant one that is not addressed in the current V&V literature that I know of (please share links if you “know of”).
It should be noted that even though the same model may be used to make predictions and perform diagnostics, it will usually be run in a different way for those two uses. The significant changes between Figure 1 and Figure 3 are the addition of a “Experimental Diagnostic” box and the change to the mathematical cartoon in the “Validation Experiment” box. The change to the cartoon is to indicate that we can’t measure what we want directly (u), so we have to use a diagnostic model to estimate it based on the things we can measure (b). An example of when the model-based diagnostic is relatively independent of the model being validated might be using laser-based diagnostic for fluid flow. The equations describing propagation of the laser through the fluid are not the same as those describing the flow. An example of when the two codes might be connected would be if you were trying to use ultrasound to diagnose a flow. The diagnostic model and the predictive model could both be Navier-Stokes with turbulence closures. Establishing the validity of which is the aim of the investigation. I’d be interested in criticisms of how I explained this / charted this out.
I’m not in the target population that professor Easterbrook is studying, but here’s my attempt at answering his questions about model validation.
- “If I understand correctly–a model is ’valid’ (is that a formal term?) if the code is written to correctly represent the best theoretical science at the time...”
I think you are using an STS flavored definition for “valid.” The IEEE/AIAA/ASME/US-DoE/US-DoD definition differs. “Valid” means observables you get out of your simulations are “close enough” to observables in the wild (experimental results). The folks from DoE tend to argue for a broader definition of valid than the DoD folks. They’d like to include as “validation” activities of a scientist comparing simulation results and experimental results without reference to an intended use.
- “– so then what do the results tell you? What are you modeling for–or what are the possible results or output of the model?”
Doing a simulation (running the implementation of a model) makes explicit the knowledge implicit in your modeling choices. The model is just the governing equations, you have to run a simulation to find solutions to those governing equations.
- “If the model tells you something you weren’t expecting, does that mean it’s invalid? When would you get a result or output that conflicts with theory and then assess whether the theory needs to be reconsidered?”
This question doesn’t make sense to me. How could you get a model output that conflicted with theory? The model is based on theory. Maybe this question is about how simplifying assumptions could lead to spurious results? For example, if a simulation result shows failure to conserve mass/momentum/energy in a specific calculation possibly due to a modeling assumption (more likely due to a more mundane error), I don’t think anyone but a perpetual-motion machine nutter would seriously reconsider the conservation laws.
- “Then is it the theory and not the model that is the best tool for understanding what will happen in the future? Is the best we can say about what will happen that we have a theory that adheres to what we know about the field and that makes sense based on that knowledge?”
This one doesn’t make sense to me either. You have a “theory,” but you can’t formulate a “model” of it and run a simulation, or just a pencil and paper calculation? I don’t think I’m understanding how you are using those words.
- “What then is the protection or assurance that the theory is accurate? How can one ‘check’ predictions without simply waiting to see if they come true or not come true?”
There’s no magic; the protection from fooling ourselves is the same as it has always been, only the names of the problems change.
- “The trouble comes when empiricism is combined with a hypothesis that the climate is stationary, which is implicit in how many of their analyses work.” 
The irony of this statement is extraordinary in light of all the criticisms by the auditors and others of statistical methods in climate science. It would be a valid criticism, if it were supported.
- “The empiricist view has never entirely faded from climatology, as, I think, we see from Curry. But it’s essentially useless in examining climate change. Under its precepts, the only thing that is predictable is stasis. Once things start changing, empirical science closes the books and goes home. At that point you need to bring some physics into your reasoning.” 
So we’ve gone from what could be reasonable criticism of unfounded assumptions of stationarity to empiricism being unable to explain or understand dynamics. I guess the guys working on embedding dimension stuff, or analogy based predictions would be interested to know that.
- “See, empiricism lacks consilience. When the science moves in a particular direction, they have nothing to offer. They can only read their tea leaves. Empiricists live in a world which is all correlation, and no causation.” 
Lets try some definitions.
- knowledge through observation
- unity of knowledge, non-contradiction
How can the observations contradict each other? Maybe a particular explanation for a set of observations is not consilient with another explanation for a different set of observations. This seems to be something that would get straightened out in short order though: it’s on this frontier that scientific work proceeds. I’m not sure how empiricism is “all correlation.” This is just a bald assertion with no support.
- “While empiricism is an insufficient model for science, while not everything reduces to statistics, empiricism offers cover for a certain kind of pseudo-scientific denialism. [...] This is Watts Up technique asea; the measurements are uncertain; therefore they might as well not exist; therefore there is no cause for concern!” 
Tobis: Empiricism is an insufficient model for science. Feynman: The test of all knowledge is experiment. Tobis: Not everything reduces to statistics. Jaynes: Probability theory is the logic of science. To be fair, Feynman does go on to say that you need imagination to think up things to test in your experiments, but I’m not sure that isn’t included in empiricism. Maybe it isn’t included in the empiricism Tobis is talking about.
So that’s what all this is about? You’re upset at Watts making a fallacious argument about uncertainty? What does empiricism have to do with this? It would be simple enough to just point out that uncertainty doesn’t mean ignorance.
Not quite blah blah blah, but the argument is still hardly thought out and poorly supported.
 Easterbrook, S., “Climate Science is an Experimental Science,”http://www.easterbrook.ca/steve/?p=1322, February 2010.
 Tobis, M., “The Empiricist Fallacy,” http://initforthegold.blogspot.com/2010/11/empiricist-fallacy.html, November 2010.
 Tobis, M., “Empiricism as a Job,”http://initforthegold.blogspot.com/2010/11/empiricism-as-job.html, November 2010.
 Tobis, M., “Pseudo-Empiricism and Denialism,”http://initforthegold.blogspot.com/2010/11/pseudo-empiricism-and-denialism.html, November 2010.
 Thacker, B. H., Doebling, S. W., Hemez, F. M., Anderson, M. C., Pepin, J. E., and Rodriguez, E. A., “Concepts of Model Verification and Validation,” Tech. Rep. LA-14167-MS, Los Alamos National Laboratory, Oct 2004.
 Easterbrook, S., “Validating Climate Models,”http://www.easterbrook.ca/steve/?p=2032, November 2010.
 Tobis, M., “Empiricism,”http://initforthegold.blogspot.com/2010/11/empiricism.html, November 2010.
[Update: George left a comment with suggestions on changing the flowchart. Here's my take on his suggested changes.Calibration"; this would properly be a separate loop with output feeding in to "Computer Model."