tag:blogger.com,1999:blog-5822805028291837738.post4624316546653021336..comments2019-12-12T08:14:06.980-05:00Comments on Various Consequences: Parameterization, Calibration and ValidationJoshua Stultshttp://www.blogger.com/profile/03506970399027046387noreply@blogger.comBlogger13125tag:blogger.com,1999:blog-5822805028291837738.post-67987856032624291512012-08-12T09:47:06.984-04:002012-08-12T09:47:06.984-04:00George Crews was kind enough to link this post fro...<a href="http://gmcrews.blogspot.com/" rel="nofollow">George Crews</a> was kind enough to link this post from a discussion on <a href="http://wmbriggs.com/blog/?p=2067&cpage=1#comment-15941" rel="nofollow">W.M. Briggs' site</a>. This more <a href="http://www.variousconsequences.com/2012/08/validating-prediction-of-unobserved.html" rel="nofollow">recent post here</a> touches on many of the same concepts.jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-8990970656400725252010-11-08T08:12:48.293-05:002010-11-08T08:12:48.293-05:00Will,
I think you'd be interested in Model Se...Will,<br /><br />I think you'd be interested in <a href="http://jmlr.csail.mit.edu/papers/volume11/guyon10a/guyon10a.pdf" rel="nofollow">Model Selection: Beyond the Bayesian/Frequentist Divide</a>. Section four addresses how cross-validation is really just adding another level to the inference (a hyper-parameter if you are Bayesian). There's still no general solution to the problem of <i>predicting performance</i> or <i>generalization risk</i>. Which is the technical way of saying "calibration is not validation."jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-57130943009834071272010-04-19T16:48:02.831-04:002010-04-19T16:48:02.831-04:00I am still scratching my head over Riechler and Ki...<i>I am still scratching my head over Riechler and Kims comment regarding the data. Specifically "Present climate, however, is not an independent data set since it has already been used for the model development(Willamson 1995)".<br /><br />I can see how cross-validation might pose a problem, but I have to ask: Why not redo a model (or at least redo one of the models) without incorporating new data?</i><br />Often the data is not used explicitly in a parameter fitting exercise, but it can implicitly guide the choices of and simplifications to the governing equations (climate models are a couple simplifications removed from the fully general conservation laws, and there's lots of processes going on besides just air and water flow). So it's hard to tease out exactly what choices are influenced by knowledge of what data.<br /><br />I think there's lots of promise in going towards Bayes methods because that tends to make the influence of the current state of knowledge on our choices more explicit.<br /><br /><i>I would be very interested to know if anyone has attempted a computational approach to feature selection.</i><br />I'd be interested to know that too; please share any links if you find them. <br /><br />Glad you like the site; I've been a little slow lately in adding more toy problem results (real life intervenes every now and then on the blogging), but hope to get back to doing it more regularly.jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-29776054550866644742010-04-19T14:48:13.340-04:002010-04-19T14:48:13.340-04:00Should be 'physics' not 'phsyics'....Should be 'physics' not 'phsyics'. :)Willnoreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-63841045156132782582010-04-19T14:40:00.985-04:002010-04-19T14:40:00.985-04:00Thank you for the reply, and excellent links, jstu...Thank you for the reply, and excellent links, jstults. Lucias decription is very well written and indeed sounds like a common recipe I've used many times myself. <br /><br />I am still scratching my head over Riechler and Kims comment regarding the data. Specifically "Present climate, however, is not an independent data set since it has already been used for the model development(Willamson 1995)".<br /><br />I can see how cross-validation might pose a problem, but I have to ask: Why not redo a model (or at least redo one of the models) without incorporating new data? I'm not familiar with computational phsyics, so please forgive my question if it sounds stupid. :) <br /><br />There also seems to be some question as to what parameters/features need to be included in the parameterization models. I would be very interested to know if anyone has attempted a computational approach to feature selection. It would seem logical given that we seem to be wading into Bayes territory anyway. :) <br /><br />Great Blog BTW. I'm adding it to my list of favorites!Willnoreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-72998674901021591372010-04-18T16:36:42.760-04:002010-04-18T16:36:42.760-04:00Hi Will, thanks for your comment. You are right, ...Hi Will, thanks for your comment. You are right, climate modeling has similarities across lots of different fields (hence the ability of interested amateurs to understand the lit fairly easily).<br /><br />That article I linked at Lucia's right before your comment describes why 'hold out' data like what you suggest is never really independant (which is also something recognized in <a href="http://j-stults.blogspot.com/2009/12/bayesian-climate-model-averaging.html" rel="nofollow">the climate modeling lit</a>). That's why computational physics communities (with climate modelers as a notable exception) have settled on a best practice of skillful prediction as a measure of validation. Rather than a cross-validation sort of approach.jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-44004565966659563352010-04-18T10:58:24.707-04:002010-04-18T10:58:24.707-04:00I've spent some time ( >10 years) designing...I've spent some time ( >10 years) designing and building pattern classifiers. I see some similarities between my field and the climate-model field in that we often have an incomplete set of noisy data, we only generally know what the output should look like, and are not exactly sure how to get from one to the other. Both fields employ a lot of math, statistics, and intuition, to get the aproximation of a function that will get us from point A to point B.<br /><br />In many areas of classification research there are standard ground-truth data sets that help provide a common measure of how classifiers perform in 'the real world'.<br /><br />If the climate model approximations were built using data up to 1980, and data from 1980 onward was reserved for validation purposes only, couldn't these various models be tested objectively for their accuracy? Or am I missing something specific to climate models?Willnoreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-78853805280900898512010-04-08T12:31:41.304-04:002010-04-08T12:31:41.304-04:00Lucia has a good post explaining this calibration ...Lucia has <a href="http://rankexploits.com/musings/2008/validation-lumpy-dont-need-no-stinkin-validation/" rel="nofollow">a good post</a> explaining this calibration / validation distinction with her simple one parameter model of the climate.jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-83770581478329400022010-03-29T18:59:51.851-04:002010-03-29T18:59:51.851-04:00A Comprehensive Validation Methodology for Sparse ...<a href="http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20100009531_2010011043.pdf" rel="nofollow">A Comprehensive Validation Methodology for Sparse Experimental Data</a><br /><b>Abstract:</b><i>A comprehensive program of verification and validation has been undertaken to assess the applicability of models to space radiation shielding applications and to track progress as models are developed over time. The models are placed under configuration control, and automated validation tests are used so that comparisons can readily be made as models are improved. Though direct comparisons between theoretical results and experimental data are desired for validation purposes, such comparisons are not always possible due to lack of data. In this work, two uncertainty metrics are introduced that are suitable for validating theoretical models against sparse experimental databases. The nuclear physics models, NUCFRG2 and QMSFRG, are compared to an experimental database consisting of over 3600 experimental cross sections to demonstrate the applicability of the metrics. A cumulative uncertainty metric is applied to the question of overall model accuracy, while a metric based on the median uncertainty is used to analyze the models from the perspective of model development by analyzing subsets of the model parameter space.</i><br /><br />On the intersection between V&V and sound software carpentry:<br /><i>The degree of confidence in a model is not only an issue of accuracy, but also of the rigor and completeness of the assessment itself. An essential aspect of a comprehensive validation effort is the development of configuration-controlled verification and validation (V&V) test cases. Configuration control (also called configuration management) is a process in which consistency is established for a product (i.e. a model or a software suite), and any changes made to the product are tracked. The effects of the changes on the product are documented, and therefore, problems caused by changes to the product can be backtracked. The models reviewed in this paper have been placed under configuration control with the criteria that the V&V test cases will be run when significant changes are made to those models. This approach allows accuracy to be tracked across the relevant range of applications and avoid situations where model changes intended for a specific calculation or application actually decrease the overall accuracy. In addition, it will enable more complete accuracy assessments to measure progress against goals as the models and codes are updated and as new data become available for validation. It also helps ensure model results are repeatable.</i>jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-64463700720304887052010-03-12T12:34:39.755-05:002010-03-12T12:34:39.755-05:00Dan makes a good point about the difference betwee...Dan <a href="http://www.easterbrook.ca/steve/?p=1388#comment-1896" rel="nofollow">makes a good point</a> about the difference between parameters that are a property of the <i>fluid</i> and those that are a property of the <i>flow</i>.<br /><br />Here's another that's focused on PDEs, <a href="http://www.google.com/url?sa=t&source=web&ct=res&cd=1&ved=0CA8QFjAA&url=http%3A%2F%2Fwww.fdm.uni-freiburg.de%2Fpublications-preprints%2Fpublications%2Fpapers%2FIJBC_mueller.pdf&ei=6nmaS5bxCYH98AaCuLWYDg&usg=AFQjCNFY85SDJynvdpa0xoyPgD-mXWveFg&sig2=Wt1P9PXpbEtqWkCJ4k8MuQ" rel="nofollow">PARAMETER IDENTIFICATION TECHNIQUES FOR PARTIAL DIFFERENTIAL EQUATIONS</a>:<i>Many physical systems exhibiting nonlinear spatiotemporal dynamics can be modeled by partial differential equations. Although information about the physical properties for many of these systems is available, normally not all dynamical parameters are known and, therefore, have to be estimated from experimental data. We analyze two prominent approaches to solve this problem and describe advantages and disadvantages of both methods. Specifically, we focus on the dependence of the quality of the parameter estimates with respect to noise and temporal and spatial resolution of the measurements. Keywords: Parameter estimation; partial differential equations; spatio-temporal systems; non- linear dynamics; complex Ginzburg–Landau equation.</i>jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-20858293962461803182010-03-11T20:44:58.736-05:002010-03-11T20:44:58.736-05:00Here's an interesting paper, Parameter Estimat...Here's an interesting paper, <a href="http://www.google.com/url?sa=X&q=http://citeseerx.ist.psu.edu/viewdoc/download%3Bjsessionid%3DFFC7D8BA7A4B3BEE8D9EA6189D011E6F%3Fdoi%3D10.1.1.111.2320%26rep%3Drep1%26type%3Dpdf&ct=ga&cd=AGdwwvD0Hcw&usg=AFQjCNGirnGcumGZD-GutW0D0rlVxNU36w" rel="nofollow">Parameter Estimation for Differential Equations: A Generalized Smoothing Approach</a><i><br /><b>Summary.</b> We propose a new method for estimating parameters in non-linear differential equations. These models represent change in a system by linking the behavior of a derivative of a process to the behavior of the process itself. Current methods for estimating parameters in differential equations from noisy data are computationally intensive and often poorly suited to statistical techniques such as inference and interval estimation. This paper describes a new method that uses noisy data to estimate the parameters defining a system of nonlinear differ- ential equations. The approach is based on a modification of data smoothing methods along with a generalization of profiled estimation. We derive interval estimates and show that these have good coverage properties on data simulated from chemical engineering and neurobiology. The method is demonstrated using real-world data from chemistry and from the progress of the auto-immune disease lupus. <br /><b>Keywords:</b> Differential equations, profiled estimation, estimating equations, Gauss-Newton methods, functional data analysis</i><br /><br />The focus is on ODEs, but many of the considerations carry-over to PDEs:<br /><i>The insolvability of most ODE’s has meant that statistical science has had little impact on the fitting of such models to data. Current methods for estimating ODE’s from noisy data are often slow, uncertain to provide satisfactory results, and do not lend themselves well collateral analyses such as interval estimation and inference. Moreover, when only a subset of variables in a system are actually measured, the remainder are effectively functional latent variables, a feature that adds further challenges to data analysis. Finally, although one would hope that the total number of measured values, along with its distribution over measured values, would have a healthy ratio to the dimension of the parameter vector θ, such is often not the case. Measurements in biological, medical and physiology, for example, may require invasive or destructive procedures that can strictly control the number of measurements that can realistically be obtained. These problems can be often be offset, however, by a high level of measurement precision.</i>jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-39600799324506835082010-03-11T10:08:17.383-05:002010-03-11T10:08:17.383-05:00To be fair, they (in the CAPT process for example)...To be fair, they (in the CAPT process for example) are actually looking at more than just a globally averaged state, they are looking at spatial and temporal error patterns too, but I think the basic criticism still applies.jstultshttps://www.blogger.com/profile/03506970399027046387noreply@blogger.comtag:blogger.com,1999:blog-5822805028291837738.post-72623338715244466252010-03-11T09:56:31.386-05:002010-03-11T09:56:31.386-05:00Josh,
I added a comment to this thread:
http://w...Josh,<br /><br />I added a comment to this thread:<br /><br />http://www.easterbrook.ca/steve/?p=1388Anonymousnoreply@blogger.com