Various Consequences: Notre Dame V&V Workshop Notes

Last October I had the opportunity to attended a V&V workshop at Notre Dame. In a previous post I said I'd put up my notes once the slides were available. This post contains my notes from the workshop. Most of the presenter's slides are available through links in the program.

There are a couple highlights from the workshop that I'll mention before dumping the chronological notes.

James Kamm gave a really great presentation on a variety of exact solutions for 1-D Euler equations. He covered the well known shock tube solutions that you'd find in a good text on Riemann Solvers. Plus a whole lot more. Thomas Zang presented work on a NASA standard for verification and validation that grew out of the fatal Columbia mishap. The focus is not so much proscribing what a user of modeling and simulation will do to accomplish V&V, but requiring that what is done is clearly documented. If nothing is done then the documentation just requires a clear statement that nothing was done for that aspect of verification, validation, or uncertainty quantification. I like this approach because it's impossible for a standards writer to know every problem well enough to proscribe the right approach, but requiring someone to come out and put in writing "nothing was done" often means they'll go do at least something that's appropriate for their particular problem.

I think that in the area of Validation I'm philosophically closest to Bob Moser who seems to be a good Bayesian (slides here). Bill Oberkampf (who, along with Chris Roy, recently wrote a V&V book) did some pretty unconvincing hand-waving to avoid biting the bullet and taking a Bayesian approach to validation, which he (and plenty of other folks at the workshop) view as too subjective. I had a more recent chance to talk with Chris Roy about their proposed area validation metric (which is in some ASME standards), and the ad-hoc, subjective nature of the multiplier for their distribution location shifts seems a lot more treacherous to me than specifying a prior. The fact that they use frequentist distributional arguments to justify a non-distributional fudge factor (which changes based on how the analyst feels about the consequences of the decision; sometimes it's 2, but for really important decisions maybe you should use 3) doesn't help them make the case that they are successfully avoiding "too much subjectivity". Of course, subjectivity is unavoidable in decision making. There are two options. The subjective parts of decision support can be explicitly addressed in a coherent fashion, or they can be pretended away by an expanding multitude of ad-hoceries.

I appreciated the way Patrick Roache wrapped up the workshop, “decisions will continue to be made on the basis of expert opinion and circumstantial evidence, but Bill [Oberkampf] and I just don’t think that deserves the dignity of the term validation.” In product development we’ll often be faced with acting to accept risk based on un-validated predictions. In fact, that could be one operational definition of experimentation. Since subjectivity is inescapable, I resort to pragmatism. What is useful? It is not useful to say “validated models are good” or “unvalidated models are bad”. It is more useful to recognize validation activities as signals to the decision maker about how much risk they are accepting when they act on the basis of simulations and precious little else.

Day One

These are my notes and commentary on Day 1 of the V&V workshop. I don’t know shorthand so there could very well be attribution and content errors. If you see something that doesn’t make sense it is probably more my failure as a scribe than error on the part of the speakers mentioned.

William Oberkampf moderated the talks and discussion.

Started with an opening key note by Robert Moser on the work they are doing funded by the Department of Energy at PECOS. The main challenge problem they are working is high-fidelity modeling of blunt body atmospheric re-entry, which has all of the challenges of hypersonics: radiating, ionizing, dissociating gasses plus interesting surface chemistry and surface dynamics due to the regressing ablator. One of the interesting projects he mentioned is in the area of providing a common framework for applying method of manufactured solutions (MMS) to all of the various codes they are using. The library is released under the LGPL, and is called Manufactured Analytical Solutions Library. They have a full-time staffer working on supporting the development of this library.

uncertainty quantification (UQ) was also another big topic. They are taking a very reasonable sounding Bayesian approach. He mentioned the difficulty of finding high-quality experimental data for validation purposes. Spatial resolution of flow diagnostics was a significant issue for finding data in the small y+ region (close to the wall) to provide useful information about the turbulence model parameters (calibration). For their main code, they have a roughly 300-dimensional input parameter space, but they’ve been able to do some sensitivity studies for their quantity of interest (QoI)s and find the 30 factors that matter most.

He also mentioned the difficulty of trying to verify legacy codes. He pointed out the continuing need to design codes from the beginning to allow verification and validation (for more on this see Roach [1]). Another speaker, Thomas Zang, mentioned the difficulty of verifying commercial codes with poor or non-existent documentation.

Mauricio Santillana presented some work on looking at the problems of atmospheric chemistry and regional down-scaling in global climate models. As he pointed out several times, “simulation results heavily depend on time and space resolutions.” One of the problems he high-lights is that the validation approach commonly applied for global models supporting policy (like Kyoto Protocol) implicitly assumes that errors introduced by discretization are small. This is not the case for many functionals, especially chemistry-sensitive ones.

Tariq Aslam of LLNL presented some “colorful fluid dynamics” simulations of a spherical detonation, and discussed some of the problems with achieving high-order convergence rates with shock-capturing rather than shock-fitting schemes.

Hany Abdel-Khalik presented some interesting ideas from the nuclear reactor engineering community about using reduced order modeling approaches. I found the ideas about finding the “active subspace” of the parameter space by sampling the sensitivity derivatives very interesting. This is related to the sensitivity analysis that Bob Moser talked about, going from a 300 dimensional space to a 30 dimensional one (still not trivial to characterize though).

Thomas Zang presented some perspectives on applying the new NASA V&V standard 7009 to development projects. The Ares I-X, Orion crew module and Mars Reconnaissance Orbiter out of JPL. He mentioned that the requirement to document user capability was resisted, because people took offense that their technical competence was being attacked. He noted that the additional costs for applying the standard to the Orion development were estimated by the contractor to be on the order of a few percent. The folks at JPL developed a “credibility assessment checklist” that sounds interesting, and possibly useful for folks doing product development. The standard does not specify a particular verification, validation or uncertainty quantification approach. It just requires that you document what you did (you can say, “I didn’t do any verification activities”, or “I am ignorant of the uncertainty”). One of the challenges noted in discussion was to get a common definition of “verification” across modeling domains, some of which are PDE solvers where the activities are well defined, and others which use simpler models, where grid convergence doesn’t apply.

During the wrap-up discussion the talk focused on interpolation vs. extrapolation. William Oberkampf said something insightful, along the lines of, “when you’re making a prediction you’re going out on a limb, what is the limb you’re out on? It’s that the model is physics based. In highly calibrated models that limb gets thinner and thinner.” The fuzziness of the distinction between interpolation and extrapolation came up. Patrick Roache noted that in high-dimensional (20 or so) parameter spaces usually nothing is densely sampled so there’s less of a “guarantee” with interpolation. The talk of extrapolating in parameter space turned to extrapolating in a modeling hierarchy for product development. The same “limb” ties together items in the hierarchy. I added a paper to my reading list that applies this idea of a validation hierarchy in the context of a hypersonic cruise missile [2].

One of the continuing themes of the morning was, “what entitles you to make a prediction?” The reason for our expectation of predictive capability is that the model captures some physical mechanism that is the same for our domain of observations and the domain of application (where we make predictions). Of course, there’s no protection against reasonable expectations being dashed by futures yet unseen: waves steepen, boundary layers transition and various consequences ensue.

The afternoon session started with a keynote by Werner Dahm, recently the Chief Scientist of the Air Force, and now with Airzona State University. He spoke on the need for applying VV&UQ techniques to “complex adaptive systems” to establish “certifiable trust” in their operations. He drew a distinction from natural systems with emergent behavior like schools of fish or flocks of birds and the complex engineered systems that are large, multi-agent systems with the potential foremergent behavior. The constraint on developing and fielding these types of systems with significant autonomy and complexity is our inability to certify that they won’t go off and do things we don’t intend (or “go rogue” as one person put it). The fix suggested parallels Roache’s advice to build PDE codes to be verifiable: design complex adaptive systems with some federated hierarchy so that it is tractable to “certify” them (whatever that may turn out to mean).

James Kamm presented a good review of some of the lesser known verification problems for one-dimensional compressible inviscid flow (Euler equations). He presented lots of variations on the shock tube (see Toro’s book for a good intro), as well as some interesting results derived by applying Lie group theory. He was very excited about “breaking codes” with some of these devious analytical solutions (the Riemann problem where you have two expansions that cause a vacuum is a pretty tough one). He noted that many of the exact solutions are available at the website of one of his colleague’s: http://cococubed.asu.edu/code_pages/vv.shtml.

Patrick Knupp gave a talk about how he’s gone about actually applying these V&V techniques and the “predictive capability maturity model” to a large code development project. He uses a hierarchical approach to organizing the documentation (but the code development activities proceed with little-to-no perturbation because of the V&V efforts).

Louis Eca talked about crafting manufactured solutions for URANS codes that have certain characteristics that mimic the mean flow of high Reynolds number turbulence near the wall. The motivation for making more physically realistic manufactured solutions is to test grid anisotropy as it would be in a physical solution, and to avoid breaking turbulence models that have certain domain assumptions built-in on the parameters (e.g. non-negativity). Part of the motivation was to also craft something that would be easy to use for a code developer (he made significant efforts to address many of the common excuses that are given by developers for not using MMS verification techniques).

Krishna Kamojjala gave an interesting talk on verifying an ALE/particle solid mechanics code that is used for impact dynamics (such as found in the operation of an oil well perforator). He developed a very interesting high-shear solution called “generalized vortex MMS”, that caused really large deformations in the grid (which are characteristic of solutions for shaped charge impacts). A method developed by Patrick Knupp for verification that uses the equation coefficients rather than a forcing term also came up in discussions.

The day ended with a lively discussion of “What is DNS?” The definition floated as a discussion starter was “fully resolved numerical solution of a robust continuum model that has been verified and validated.” Various problems with the definition were pointed out. The consensus was that DNS is not special. It is subject to the same verification and validation considerations as simulations which require empirical closures. The use of DNS solutions as a source of “validation” data was mentioned. This was generally scoffed at, but a good point was made that DNS could be viewed in a way similar to a “data reduction model.” In which case it would probably be a useful source of validation data (because it “compresses” the information from so much of our empirical observations in its domain of applicability). In response to the question, “is a model compared to DNS valid”, I said, “if the decision maker accepts DNS as evidence and acts on it, then it’s valid.” This earned hearty guffaws from the crowd. The political or subjective aspects of validation are inescapable, but technical people don’t like to discuss them. So for now, if anyone asks me “what is validation?”, I think my answer will be “it’s a warm feeling deep in the cockles of ones heart.” If pressed further, the legal definition of pornography offers a useful exemplar, “I’ll know it when I see it.”

Day Two

The morning started off with a keynote given by William Rider that had a great introduction to the history of the art of scientific computing (hydrodynamic calculations to support the Manhattan project). He talked about the culture in different communities concerning how a simulation result is said to be “good” or “bad.” In broad terms, he said the engineers tend to have a much more evidence based, “show me the evidence” sort of approach than the physicists, who tend to rely more on qualitative expert judgment (the “eye-ball norm”). He mentioned that in many communities often the results from a legacy code are the baseline for “good.” A good point that he made was that expectations for quality depend on who is doing the asking, whether it is code developers, or analysts, or decision makers. The later part of the talk focused on editorial policies in various journals. He noted that though the engineering journals have embraced the ideas of calculation verification more readily in their policy statements, the practice falls short of the rhetoric in most cases. Plenty of simulation results get published without even cursory attention paid to code or calculation verification. Some of the discussion pointed out that the editorial policies are empty legalism. If the culture of researchers and developers is to not do verification, then it won’t be done. I think that forcing V&V through rules is doomed to failure. Selling V&V as a useful set of tools for code developers, analysts and decision makers might actually work.

Following the keynote and discussion Zachary Zikoski presented some work on 1-D calculations that use an adaptive wavelet scheme, Wavelet Adaptive Multi-resolution Representation (WAMR) to do automated calculation verification. A sort of “grid convergence” can be performed by dialing down the error tolerance governing the adaptive grid.

A.T. Conlisk presented some work on using a “plausible alternative model” for validation purposes. He used two independent models, a molecular dynamics solver and a continuum solver to arrive at consistent answers. One of the problems that was high-lighted in discussion was verification of the molecular dynamics solver. The problem is that the many body problem simulated by the molecular dynamics solver exhibits chaotic trajectories so convergence under time-step refinement is difficult to assess.

James Glimm presented some thoughts on verification of LES for a high Mach number turbulent, chemically reacting flow (hydrogen fueled scram-jet). The work that he mentioned in passing on converging solutions to Rayleigh-Taylor instabilities and Richtmeyer-Meshkov instabilities is some of the most careful and convincing simulation and experimental analysis I’ve ever read (see some of the related UQ stuff in this paper [3]). I think the ideas of convergence in distribution for ensembles are important to verification involving turbulent (or chaotic) simulation results. One of the reasons I find Glimm’s work on these instability problems so useful is that they were able to explain discrepancies in the experimental results by careful attention to the long-wave length perturbations existing in the initial conditions (and they also did an extrapolation based on linear perturbation theory to get the “real” ICs at time zero). This explained previously unexplained experimental results, and matched those results convincingly.

Before the lunch break the discussion, moderated by Gretar Tryggvason, focused on the question of “when is it justified to carry the “burden” of the V&V apparatus?” Is there a trade-off between being innovative and being accurate? It was noted in discussion that it depends on the purpose to which the results will be put as to how much rigor is required to demonstrate correctness. Bill Rider threw the general circulation model (climate model) turd on the table, and the room fell awkwardly silent. That topic is almost too political for polite discussion. “We all agree that the science is settled, right? [devious grin]” Bob Moser made a good point about turbulence model parametrizations that depend on a length scale (often tied to a grid size), so grid refinement was sometimes ill-defined. This is probably a clear model deficiency that should be fixed, but it’s often used as an excuse to avoid verification activities. The climate modeling problem is just a very public example of the very antagonistic environment that can develop around establishing the credibility of simulation results. Code developers and analysts are no different than everyone else: lots of people love to take offense over perceived slights to their honor. Asking for evidence of verification can easily be taken as insults or politically motivated attacks.

After the lunch break Helen Reed gave a talk on laminar-to-turbulent transition for low-speed flows. This is an area that has benefited greatly from very tight collaboration between experimental and computational work. Simple geometries and low-Reynolds numbers which are accessible to DNS show very good agreement with experimental results when the experiments are accurately characterized, and special care is taken that modeling assumptions like periodic boundary conditions aren’t grossly violated in the experiment.

Tom Shih presented some information on establishing single-grid error estimators using a discrete-error-transport equation. This is done by modeling the residual with two solutions at different order. There were significant difficulties encountered in extending the approach to unsteady problems.

Barna Szabo presented some technical requirements for finite element analysis software to support V&V activities. Included in his requirements was a modeling hierarchy so that modeling assumptions could be rigorously evaluated. The idea is to show that quantities of interest are converged to some error tolerance, and that they are insensitive to the modeling assumptions (e.g. 2D vs. 3D).

Christopher Roy presented some work on using residual-based error estimators. The motivation for this approach is to require fewer grids than Richardson extrapolation-based approaches. The discussion centered on dealing with noise in the simulation and how this pushes up the requirement for the number of grids anyway. Maybe more than the leading error term should be included in the modified equation (for the single grid approaches).

Dinshaw Balsara gave an exciting talk on turbulent star formation in high Mach number flows. Some things he said, that made me smile: “I believe operator splitting is a bad idea for stiff source terms, I’ll be using V&V methods to show why,” and “[this is a] high-order Lax-Wendroff type approach implemented in a sophisticated and stylish fashion.” This man loved his numerics, and it was fun to listen.

Karel Matous was nearly as entertaining in describing his work on “closing the loop” between his multi-scale particle / turbulent combustion and fracture mechanics work and high resolution tomographic measurements of the material he is modeling. I think I heard these right: “unless we solve the particle zoo, we always need to calibrate,” “stochastic fog propagates through the whole system” (in reference to uncertainty quantification).

The wrap-up discussion was moderated by Chris Roy. The discussion was on “practical solution verification approaches for complex scientific computing applications.” This discussion was basically an extension of the discussion around his talk about residual based error estimators. One interesting thing that was mentioned was to tell analysts to coarsen their grids to estimate error rather than telling them to refine the grids. They are probably already running the finest grid they can afford. An additional practical consideration was that maybe it was the software developers who should shoulder the burden of implementing grid coarsening / error estimation so that the analyst could do it easily / automatically.

Day Three

The keynote was given by Charles Taylor of HeartFlow. He presented their work on using a combination of patient-specific fluid-structure interaction (FSI) modeling of the coronary blood flow. The motivation of the work is to provide physicians with a non-invasive, low-risk way of diagnosing coronary disease, and to also provide a way to do rational planning of corrective surgery if that becomes necessary. On of the key factors in performing this sort of patient specific modeling is the rapid generation of computational grids from CT scans of the patient. The regulatory oversight folks actually require grid refinement studies as part of getting the method approved for use in practice. The diagnostic quantity of interest is a pressure ratio up and down-stream of constrictions on the arteries, very similar to a turbo-machinery component figure of merit. One of the interesting aspects of the modeling approach is the lumped-element linear models used to specify impedance boundary conditions. The impedance of the down-stream system is approximated by one-dimensional linear wave propagation calculations.

Miguel Aguilo gave a talk about using a Bayesian maximum a posteriori (MAP) estimation procedure for inverse problems. He discussed some linearizations made around the MAP estimate that were used to get the parameter covariance information more cheaply than naive Markov Chain Monte Carlo (MCMC) methods. They also used Gaussian radial basis functions (RBF)s to reduce the dimensionality of the unknown spatial material parameter space.

Vincente Romero presented a validation framework that is an alternative to either of those presented in the ASME V& 20 standard, or Roy and Oberkampf 2011 [4]. He said one of his difficulties was in establishing a non-subjective standard for validation (“difficult to get non-arbitrary, non-subjective accuracy requirement”, I’d go further and say “impossible to do so”). His attempt at a “weak standard” was “reality lying within the predictions is what a designer or decision maker wants.” This is contrary to my experience with decision makers. Usually they are more interested in what the risk exposure to catastrophe is, not whether the mean response ends up inside a confidence interval. This also high-lights another reason why he had such difficulty, if you make enough predictions eventually “reality” shows up outside the confidence intervals. All a frequentist confidence interval says is “the answer is either inside this interval or it’s not.” That tautology is hardly useful to decision makers. The lone out-spoken Bayesian at the conference, Bob Maser, unfortunately wasn’t at the last day’s sessions. One good point he brought up was the idea of balancing the model builder’s risk and the model user’s risk. This area deserves more focused attention.

Patrick Roache talked about terminology. He said that a better term for what is sometimes referred to as “model form error” is “total validation uncertainty.” This is because a large validation uncertainty could be due to the noisiness of the experimental data, and calling it “model form error” seems to penalize a model for something that has nothing to do with the model or its form and everything to do with a noisy experiment.

Some quotes of note from various people during the wrap-up discussion.

Technical people often “stick to verification because validation is so subjective”
“uncertainty quantification will swallow verification and validation”
“validation is an application specific thing”
“you can have a useful model without validation”
“sometimes all you can do is speculate”
“the application specific view of validation is an unmanageable problem”

References

[1] Roache, P.J., “Building PDE codes to be verifiable and validatable,” IEEE, Computing in Science & Engineering, vol. 6, no. 5, Sept-Oct, 2004.

[2] Oberkampf, W.L., Trucano, T.G., “Validation Methodology in Computational Fluid Dynamics,” AIAA 2000-2549, 19-22 June, Fluids 2000, Denver.

[3] Y. Yu, M. Zhao, T. Lee, N. Pestieau, W. Bo, J. Glimm, J.W. Grove, “Uncertainty quantification for chaotic computational fluid dynamics”, Journal of Computational Physics, Volume 217, Issue 1, 1 September 2006, Pages 200-216, ISSN 0021-9991, 10.1016/j.jcp.2006.03.030.

[4] Roy, C.J., Oberkampf, W.L., “A Comprehensive Framework for Verification, Valdation, and Uncertainty Quantification in Scientific Computing,” CMAME, 200 (2011) 2131-2144.

Various Consequences

Pages

Saturday, June 23, 2012

Notre Dame V&V Workshop Notes

Day One

Day Two

Day Three

References

No comments:

Post a Comment

Post Archive

Parts on Shapeways

Diode Gear

Topics