Saturday, February 20, 2010

Validation and Credibility

Outside of certain small technical communities the importance and purpose of VV&UQ for computational physics is little understood. Normally this unawareness of an arcane technical practice would not matter much, but since many modern policy decisions are being based, in-part if not in-whole, on the output of large simulations it becomes a matter that demands broader understanding. The fundamental reason for that entire effort (and it takes a lot of effort to get right) is to establish the credibility of our simulation outputs so that we can use them to support rational decision making. The first paragraph of the ASC fact-sheet on the subject (as applied to nuclear stock-pile stewardship) sums things up well (emphasis mine):


The Advanced Simulation and Computing (ASC) Verification & Validation (V&V) program exists to establish a technically rigorous foundation of credibility for the computational science and engineering calculations required by the NNSA Stockpile Stewardship Program. This program emphasizes the development and implementation of science-based verification and validation methods for the support of high-consequence decisions regarding the management of the U.S. nuclear stockpile. The V&V process reduces the risk of incorrect stockpile decisions by establishing that the calculations provide the right answers for the right reasons.
-- Verification and Validation: Credibility in Stockpile Modeling and Simulation
Microscopic Description of the Fission Process is an interesting set of slides dealing with this credibility building exercise where empirical closure is concerned. The question addressed on slide 3 is, "Why focus on the microscopic nuclear theories?" Here are the reasons given:
  • The nuclear many-body problem is very complex, computationally difficult
  • Much of the progress in the past 50 years has been based on empirical models (most with microscopic degrees of freedom) tuned to experimental data

    • This highly limits our predictive capability
    • And it is difficult to estimate the uncertainties

      • How can we do Quantification of Margins and Uncertainties?
With a fundamental picture of nuclei based on the correct microphysics, we can remove the empiricism inherent today, thereby giving us greater confidence in the science we deliver to the programs
This illustrates the idea that the credibility of validation has more to do with the quality of the model's physical and theoretical basis as opposed to the coverage or quantity of experimental data (though validation experiments are still quite necessary). Part of the goal of any continuing validation process (link by way of Dan Hughes' site) is to reach the point where all of the empirical closures (constitutive relations or sub-grid scale parameterizations in the context of large multi-physics codes) have sound theoretical and physical basis.

That fact sheet mentioned above also has a good list of the sorts of deliverables that we should expect as consumers of simulation-based decision support:
  • Documented analysis and conclusion of the confidence level of the models as a result of the V&V activities.
  • Repository of test results associated with unit / regression / system tests, verification and validation tests and/or list of test data used.
  • Documented code (feature) and solution (model) verification.
  • Documented V&V environment (constraints, assumptions, and tools).
  • Repository of code versions, fixes, and other patches used during the V&V phase.
The availability and open accessibility of these things is critical to building the credibility of, and consensus around decision-support products. This is because there is little 'fact-checking' that decision makers can do without access to their own supercomputers and experts for re-running or re-implementing the simulations. In this situation of high barriers to checking or independent confirmation an open notebook documenting a formal and rigorous process is crucial.

The entire point of requiring the VV&UQ process is not that we hope to prove a model implementation is correct or true (an impossible task), but that we understand the importance of generating credible results that faithfully represent the reality of interest through sound physical reasoning, and which are shown to be useful for a particular purpose. VV&UQ is the set of best practices for providing unimpeachable, science-based support for decisions that matter.


  1. "How can we do quantification of margins and uncertainties?" A little background reading on the topic: Conceptual and Computational Basis for the Quantification of Margins and Uncertainty

  2. Dan Hughes has an extensive write-up of VV&UQ as it applies to climate modeling software, here's his list, pretty similar to the one from the ASC slides:
    1. A theory manual in which the details of all models and methods used in the software are described.
    2. A user manual that describes how to use the software.
    3. A computer programmer manual in which the details of the structure and coding are described.
    4. A V&V manual in which the verification and validation activities associated with the software are described.
    5. Other manuals and reports in which the results of analyses with the software in its intended application areas are described.
    6. A software QA plan that describes the procedures which are in place for maintaining the quality status of the software.

  3. The Community Climate Systems Model web site seems to be on its way towards containing the V&V and software quality assurance things in mentioned in the lists above; the weak areas seem to be 3,4,5 and 6 on Dan's list.

  4. Is Spin-Up Validation? Presentation slides about an ice-sheet model: verification, validation and basal strength in models for the present state of ice sheets, see also Parallel Ice Sheet Model home page and documentation

    They correctly identify the problem as one in inverse modeling (see for instance my mini rant at the bottom of this post about Bayesian solutions to noisy inverse problems). They use a regularization based on adding up a bunch of norms of the solution state (see Tikhonov regularization or alternatively Total variation regularization)

  5. The Fire Dynamics Simulator project has a great set of example documents, included are:
    - Configuration Management Plan
    - Technical Reference Guide
    - Users Guide
    - Validation Guide
    - Verification Guide

  6. Dimensions of Credibility in Models and Simulations [pdf slides]
    From slide 29:
    Why doesn't/can't the Software Engineering Req'ts do the job?
    SW engineering does not address many critical M&S issues:
    - development of models
    - validation against experimental or flight data
    - uncertainty quantification
    - operations and maintenance of M&S

    The 'dimensions of credibility' identified are Development, Operations and Supporting Evidence (with sub-categories below those).

  7. In Verification and Validation Benchmarks Oberkampf and Trucano provide some commentary that is nearly as entertaining as Roache's
    From a historical perspective, we are in the early days of changing from an engineering culture where hardware is built, tested, and then redesigned, if failure occurred, to a culture that is more and more reliant on computational simulation. To have justified confidence in this evolving culture, we must make major improvements in the transparency and maturity of the computer codes used, the clarity of the physics included and excluded in the modeling, and the comprehensiveness of the uncertainty assessment performed. Stated more bluntly, we need to move from a culture of glossy marketing and arrogance to a culture that forthrightly addresses the limitations, weaknesses, and uncertainty of our simulations.

    The growing pains that Lindzen identifies for climate science are not unique to that field.

  8. This is interesting, credibility in the face of uncertainty is a common engineering theme:
    Structural engineering is the art of assembling materials whose properties we do not fully understand into arrangements we cannot fully analyze to support loads we cannot fully predict -- and to do so in a convincing enough fashion so that the public has complete confidence in the resultant structures.
    The Essential Engineer: Why Science Alone Will Not Solve Our Global Problems, By Henry Petroski

  9. I posted a link to this paper on applying CFD-style V&V methods to ice-sheet flow models on Easterbrook's site and on Curry's site. This snippet from that paper struck me as particularly well-worded:
    In this sense, verification is the process of doing experiments on the numerical schemes themselves, and this is different from the process of benchmarking the codes to test cases that represent a certain fidelity of reality.