Saturday, January 22, 2011

Recurrence, Averaging and Predictability

Motivation and Background

Yet another installment in the Lorenz63 series. This time motivated by a commenter on Climate Etc. Tomas Milanovic claims that time averages are chaotic too in response to the oft repeated claim that the predictability limitations of nonlinear dynamical systems are not a problem in the case of climate prediction. Lorenz would seem to agree, “most climatic elements, and certainly climatic means, are not predictable in the first sense at infinite range, since a non-periodic series cannot be made periodic through averaging [1].” We’re not going to just take his word on it. We’ll see if we can demonstrate this with our toy model.

That’s the motivation, but before we get to toy model results a little background discussion is in order. In this previous entry I illustrated the different types of functionals that you might be interested in depending on whether you are doing weather prediction or climate prediction. I also made the remark, “A climate prediction is trying to provide a predictive distribution of a time-averaged atmospheric state which is (hopefully) independent of time far enough into the future.” It was pointed out to me that this is a testable hypothesis [2], and that the empirical evidence doesn’t seem to support the existence of time-averages (or other functionals) describing the Earth’s climate system that are independent of time [3]. In fact, the above assumption was critiqued by none other than Lorenz in 1968 [4]. In that paper he states,

Questions concerning the existence and uniqueness of long-term statistics fall into the realm of ergodic theory. [...] In the case of nonlinear equations, the uniqueness of long-term statistics is not assured. From the way in which the problem is formulated, the system of equations, expressed in deterministic form, together with a specified set of initial conditions, determines a time-dependent solution extending indefinitely into the future, and therefore determines a set of long-term statistics. The question remains as to whether such statistics are independent of the choice of initial conditions.

He goes on to define a system as transitive if the long-term statistics are independent of initial condition, and intransitive if there are “two or more sets of long-term statistics, each of which has a greater-than-zero probability of resulting from randomly chosen initial conditions.” Since the concept of climate change has no meaning for statistics over infinitely long intervals, he then defines a system as almost intransitive if the statistics at infinity are unique, but the statistics over finite intervals depend (perhaps even sensitively) on initial conditions. In the context of policy relevance we are generally interested in behavior over finite time-intervals.

In fact, from what I’ve been able to find, different large-scale spatial averages (or coherent structures, which you could track by suitable projections or filtering) of state for the climate system face similar limits to predictability as un-averaged states. The predictability just decays at a slower rate. So instead of predictive limitations for weather-like functionals on the order of a few weeks, the more climate-like functionals become unpredictable on slower time-scales. There’s no magic here, things don’t suddenly become predictable a couple decades or a century hence because you take an average. It’s just that averaging or filtering may change the rate that errors for that functional grow (because in spatio-temporal chaos different structures, or state vectors, will have different error growth rates and reach saturation at different times). Again Lorenz puts it well, “the theory which assures us of ultimate decay of atmospheric predictability says nothing about the rate of decay” [1]. Recent work shows that initialization matters for decadal prediction, and that the predictability of various functionals decay at different rates [5]. For instance, sea surface temperature anomalies are predictable at longer forecast horizons than surface temperatures over land. Hind-casts of large spatial averages on decadal time-scales have shown skill in the last two decades of the past century (though they had trouble beating a persistence forecast for much of the rest of the century) [6].

I’ve noticed in on-line discussions about climate science that some people think that the problem of establishing long term statistics for nonlinear systems is a solved one. That is not the case for the complex, nonlinear systems we are generally most interested in (there are results for our toy though [78]). I think this snippet sums things up well,

Atmospheric and oceanic forcings are strongest at global equilibrium scales of 107 m and seasons to millennia. Fluid mixing and dissipation occur at micorscales of 10-3 m and 10-3s, and cloud particulate transformations happen at 10-6 m or smaller. Observed intrinsic variability is spectrally broad band across all intermediate scales. A full representation for all dynamical degrees of freedom in different quantities and scales is uncomputable even with optimistically foreseeable computer technology. No fundamentally reliable reduction of the size of the AOS [atmospheric oceanic simulation] dynamical system (i.e., a statistical mechanics analogous to the transition between molecular kinetics and fluid dynamics) is yet envisioned. [9]

Here McWilliams is making a point similar to that made by Lorenz in [4] about establishing a statistical mechanics for climate. This would be great if it happened, because that would mean that the problem of turbulence would be solved for us engineers too. Right now the best we have (engineers interested in turbulent flows and climate scientists too) is empirically adequate models that are calibrated to work well in specific corners of reality.

Lorenz was responsible for another useful concept concerning predictability, that is predictability of the first and second kind [1]. If you care about the time-accurate evolution of the order of states then you are interested in predictability of the first kind. If, however, you do not care about the order, but only the statistics, then you are concerned with predictability of the second kind. Unfortunately, Lorenz’s concepts of first and second kind predictability have been morphed in to a claim that first kind predictability is about solving initial value problem (IVP)s and second kind predictability is about solving boundary value problem (BVP)s. For example, “Predictability of the second kind focuses on the boundary value problem: how predictable changes in the boundary conditions that affect climate can provide predictive power [5].” This is unsound. If you read Lorenz closely, you’ll see that the important open question he was exploring about whether the climate is transitive, intransitive or almost intransitive has been assumed away by the spurious association of kinds of predictability with kinds of problems [1]. Lorenz never made this mistake, he was always clear that the difference in kinds of predictability depends on the functionals you are interested in, not whether it is appropriate to solve an IVP or a BVP (what reason could you have for expecting meaningful frequency statistics from a solution to a BVP?). Those considerations depend on the sort of system you have. In an intransitive or almost intransitive system even climate-like functionals depend on the initial conditions.

A good early paper on applying information theory concepts to climate predictability is by Leung and North [10], and there is a more recent review article that covers the basic concepts by DelSole and Tippett [11].

Recurrence Plots

Recurrence plots are useful for getting a quick qualitative feel for the type of response exhibited by a time-series [1213]. First we run a little initial condition (IC) ensemble with our toy model. The computer experiment we’ll run to explore this question will consist of perturbations to the initial conditions (I chose the size of the perturbation so the ensemble would blow-up around t = 12). Rather than sampling from a distribution for the members of the ensemble, I chose them according a stochastic collocation (this helps in getting the same results every time too).

Figure 1: Initial Condition Ensemble

One thing that these two plots makes clear is that it doesn’t make much sense to compare individual trajectories with the ensemble mean. The mean is a parameter of a distribution describing a population of which the trajectories are members. While the trajectories are all orbits on the attractor, the mean is not.

Figure 2: Chaotic Recurrence Plots

Comparing the chaotic recurrence plots with the plots below of a periodic series and a stochastic series illustrates the qualitative differences in appearance.

Figure 3: Non-chaotic Recurrence Plots

Clearly, both the ensemble mean and the individual trajectory are chaotic series, sort of “between” periodic and stochastic in their appearance. Ensemble averaging doesn’t make our chaotic series non-chaotic, what about time averaging?

Predictability Decay

How does averaging affect the decay of predictability for the state of the Lorenz63 system, and can we measure this effect? We can track how the predictability of the future state decays given knowledge of the initial state by using the relative entropy. There are other choices for measures such as mutual information [10]. Since we’ve already got our ensemble though, we can just use entropy like we did before. Rather than just a simple moving average, I’ll be calculating an exponentially weighted one using an FFT-based approach, of course (there’s some edge effects we’d need to worry about if this were a serious analysis, but we’ll ignore that for now). The entropy for the ensemble is shown for three different smoothing levels in Figure 4 (the high entropy prior to t = 5 for the smoothed series is spurious because I didn’t pad the series and it’s calculated with the FFT).


Figure 4: Entropy of Exponentially Weighted Smoothed Series

While smoothing does lower the entropy of the ensemble (lower entropy for more smoothing / smaller λ), it still experiences the same sort of “blow-up” as the unsmoothed trajectory. This indicates problems for predictability even for our time-averaged functionals. Guess what? The recurrence plot indicates that our smoothed trajectory is still chaotic!


Figure 5: Smoothed Trajectory Recurrence Plot

This result shouldn't be too surprising, moving averages or smoothing (of whatever type you fancy) are linear operations. It would probably take a pretty clever nonlinear transformation to turn a chaotic series into a non-chaotic one (think about how the series in this case is generated in the first place). I wouldn't expect any combination of linear transformations to accomplish that.


I’ll begin the end with another great point from McWilliams (though I’ve not heard of sub-grid fluctuations referred to as “computational noise,” that term makes me think of round-off error) that should serve to temper our demands of predictive capability from climate models[9]:

Among their other roles, parametrizations regularize the solutions on the grid scale by limiting fine-scale variance (also known as computational noise). This practice makes the choices of discrete algorithms quite influential on the results, and it removes the simulation from the mathematically preferable realm of asymptotic convergence with resolution, in which the results are independent of resolution and all well conceived algorithms yield the same answer.

If I had read this earlier, I wouldn’t have spent so much time searching for something that doesn’t exist.

Regardless of my tortured learning process, what do the toy models tell us? Our ability to predict the future is fundamentally limited. Not really an earth-shattering discovery; it seems a whole lot like common sense. Does this have any implication for how we make decisions? I think it does. Our choices should be robust with respect to these inescapable limitations. In engineering we look for broad optimums that are insensitive to design or requirements uncertainties. The same sort of design thinking applies to strategic decision making or policy design. The fundamental truism for us to remember in trying to make good decisions under the uncertainty caused by practical and theoretical constraints is that limits on predictability do not imply impotence.


[1]   Lorenz, E. N., The Physical Basis of Climate and Climate Modeling, Vol. 16 of GARP publication series, chap. Climatic Predictability, World Meteorological Organization, 1975, pp. 132–136.

[2]   Pielke Sr, R. A., “your query,” September 2010, electronic mail to the author.

[3]   Rial, J. A., Pielke Sr, R. A., Beniston, M., Claussen, M., Canadell, J., Cox, P., Held, H., Noblet-Ducoudr, N. D., Prinn, R., Reynolds, J. F., and Salas, J. D., “Nonlinearities, Feedbacks And Critical Thresholds Within The EarthS Climate System,” Climatic Change, Vol. 65, No. 1-2, 2004, pp. 11–38.

[4]   Lorenz, E. N., “Climatic Determinism,” Meteorological Monographs, Vol. 8, No. 30, 1968.

[5]   Collins, M. and Allen, M. R., “Assessing The Relative Roles Of Initial And Boundary Conditions In Interannual To Decadal Climate Predictability,” Journal ofClimate, Vol. 15, No. 21, 2002, pp. 3104–3109.

[6]   Lee, T. C., Zwiers, F. W., Zhang, X., and Tsao, M., “Evidence of Decadal Climate Prediction Skill Resulting from Changes in Anthropogenic Forcing,” Journal of Climate, Vol. 19, 2006.

[7]   Tucker, W., The Lorenz Attractor Exists, Ph.D. thesis, Uppsala University, 1998.

[8]   Kehlet, B. and Logg, A., “Long-Time Computability of the Lorenz System,”

[9]   McWilliams, J. C., “Irreducible Imprecision In Atmospheric And Oceanic Simulations,” Vol. 104 of National Academy of Sciences, National Academy of Sciences, pp. 8709 – 8713.

[10]   Leung, L.-Y. and North, G. R., “Information Theory and Climate Prediction,”Journal of Climate, Vol. 3, 1990, pp. 5–14.

[11]   DelSole, T. and Tippett, M. K., “Predictability: Recent insights from information theory,” Reviews of Geophysics, Vol. 45, 2007.

[12]   Eckmann, J.-P., Kamphorst, S. O., and Ruelle, D., “Recurrence Plots of Dynamical Systems,” EPL (Europhysics Letters), Vol. 4, No. 9, 1987, pp. 973.

[13]   Marwan, N., “A historical review of recurrence plots,” The European PhysicalJournal - Special Topics, Vol. 164, 2008, pp. 3–12, 10.1140/epjst/e2008-00829-1.


  1. Dr Pielke has a post up about recent news coverage on predictability. He was also kind enough to link this post on one of his previous entries.

  2. Very informed and excellent post !
    I just found it following a kind mention by Dan Hughes and will perhaps comment more in detail later .
    It resonates with a post I made here a year ago or so , pointing out that the ergodicity was the single most important issue in climate matters .
    It is largely the same thing as what Lorenz said in the quotes above .

    You wrote
    "Lorenz would seem to agree, “most climatic elements, and certainly climatic means, are not predictable in the first sense at infinite range, since a non-periodic series cannot be made periodic through averaging [1].” We’re not going to just take his word on it. We’ll see if we can demonstrate this with our toy model.

    So just for fun a much faster demonstration in 3 lines :)

    A solution is chaotic if the trajectories diverge exponentially.
    That is : f(X0+dX0,t) - f(X0,t) = g(X0).exp(L.t) where :
    X0 are the initial conditions , f is the chaotic solution , g is some function of IC only and L>0 (Lyapounov coefficient) .
    Let's define the time average TA(X0,t) = 1/T Integral t to T + t (f(X0,x)dx).

    T is some arbitrary averaging period
    TA(X0+dX0,t) - TA(X0,t) = 1/T Integral t to T + t [f(X0+dX0,x) - f(X0,x)]dx = 1/T Integral t to T + t [g(X0).exp(L.x)]dx =

    The trajectories of the average diverge exponentially for any arbitrary averaging period T .
    Therefore if a solution is chaotic , all its time averages are chaotic too .
    QED .

  3. Thanks again for your thoughtful comments Tom; I really appreciate your contributions.

  4. The comment I wanted to add was that this result can't be easily transported to the climate .
    Indeed climate is spatio-temporal chaos and the result obtained here is only valid for chaotic solutions in temporal chaos .

    This difficulty is due to the fact that you correctly identified - temporal chaos deals with functions while climate science deals with functionals .
    I know of only one person who could deal with functionals correctly and even did it so correctly that he got a Nobel for that - Feynman :)

    Once you said that you said almost everything .
    A differential of a functional is not uniquely defined so you loose all the impressive and necessary weaponry of differential calculus once you start having only functionals .

    As a side note I go ROFL everytime when I see expressions like dTg/dF (which is supposed to define the climate sensitivity) and where Tg is an average global temperature and F some forcing .
    Tg being a functional that makes dTg be .... what ?
    A nonsense .
    But nonsense won't of course stop climate "scientists" to come to far reaching conclusions .

    So it is a kind of damned if you do , damned if you don't .
    If you don't use functionals , you face a full blown spatio temporal chaos with its untractable infinite dimensionality (of the phase space) .
    If you use functionals , a miracle seems to happen because you obtain things that superficially look like functions of time only .
    But it is only an illusion and you are strictly forbidden to plug that thing in the good old temporal chaos theory .

    So what is left ?
    I have already written about it here .
    If a full spatio-temporal chaos theory is not feasible in the next 100 years or so , then what is left is this : .
    This is one of the ways to discretize space and thus transform the world in a FINITE spatial network of coupled oscillators chaotic or not .

    And of course hope that this discretization will produce dynamics that are approximations of the real system for a sufficiently long time .
    As you are apparently expert in numerical treatements , you will intuitively see the difficulty at once - imagine that you want to study a convergence of an algorithm depending on the grid size but you don't know the function that has to be represented by the algorithm .

    Actually it is even worse :)
    On every node of the grid you have an unknown and different function of time and you are supposed to combine them in some way to obtain a known result and this independently of the grid size .
    And the amusing part is that if you divide the grid step by 2 , not only you need X times more functions but it can't be excluded that the functions that worked with the previous size must be changed too .

  5. I think this is very important work, essentially showing how robust the formulations are to uncertainties. It seems like the complement to this is the introduction of order out of noise and disturbances, which is the basic idea behind stochastic resonance. If you could unify these two ideas, it would help clear things up in my mind.

    So in a particular situation, is the oscillation observed because it is always there and just immune to disturbances, or does it emerge from the noise, resonantly amplified as a kind of principle component from the nonlinear positive feedback of the system? Or could these two ideas be the same principle?

  6. WHT, you must be spying on me; I've been noodling around a stochastic resonance post, but haven't really got a good "hook" for it yet. I don't think I'm understanding your question. Forcings that should be too small to drive the oscillations are able to because the stochastic noise pushes the system "over the top" (and back again). I think I'll get to a forced Lorenz system (just periodic forcing), before I get to stochastic resonance, but all these ideas are related.

  7. Thanks, That answers my question. So it is really dependent on the size of the forcing; if the forcing is not big enough, a stochastic resonance mechanism can kick in.

  8. Is that a moment closure technique that Tom/Tomas is using to show that the average diverges?

    I was looking at this paper a few weeks ago in the context of Monte Carlo sims, which struck a chord:

  9. Joshua :

    I think I'll get to a forced Lorenz system (just periodic forcing), before I get to stochastic resonance, but all these ideas are related.

    Well thought .
    But before going there , read that :

    Just to save your time :)

  10. Good paper Tom; thanks. I was going to approach it from a slightly different angle, but I'm going to incorporate a bit of discussion of that one now too.

  11. Joshua

    My skills and experience with numerical methods are far below yours and Dan Hugh's .
    I am a physicist who does 95% of his physics with a pencil and a sheet of paper so when I look at your scripts , my eyes glaze over .
    Sure I learned some time ago things about Runge Kuttas and such but have never really do it in practice .
    That's why your very target as in the link you gave above stays largely mysterious to me .
    My certainly imperfect interpretation is that you want to know things about the trajectory divergence/convergence and then I cannot decide if the questions you ask are simple or on the contrary so complex that I did not understand them .

    Focusing a moment on the "simple" interpretation .
    A characteristics of a chaotic solution is , as I already wrote above , that :
    f(X0+dX0,t) - f(X0,t) = g(X0).exp(L.t)
    From there follows trivially that for times less than 1/L the trajectories don't diverge too wildly and regardless of the "time step" which is necessarily much smaller than 1/L , when the time step decreases you will get something that looks like "convergence" .
    One of course doesn't know exactly g(X0) but if it is small , it will contribute to keeping the trajectories near to each other for times below 1/L .

    On the other hand for times greater than 1/L (note that this is just a convention , an order of magnitude) , the term exp(L.t) will overwhelm everything and the increase of the length of convergence by taking smaller and smaller dt will become marginal .
    In summary depending on the exact form of g(X0) and the value of L , there will be a time T beyond which no decrease of dt will improve the convergence significantly .

    Ultimately the smallest possible dt is given by the construction properties of the computer .
    The corollary of the above is that once one uses the smalles dt allowed by the hardware , everything computed beyond T is just an artefact .

    For fun it is possible to mathematically prove that ANY numerical solution of a chaotic system is NECESSARILY an artefact LATEST beyond some finite T .
    This doesn't however mean that it becomes an artefact exactly at T . It may become an artefact even before T (for grossly large dt) or a long time after T . You simply can't know .

    A numerical analysis of convergence will give hints about this T and eventually about the g(X0) .
    Normally what you should see if you plot differences between trajectories with different time steps what is equivalent to take the differences between different initial conditions , e.g f(X0+dX0,t) - f(X0,t)(this should be proven if one wants to be rigorous), is a horizontal 0 line untill approximately T and then a more or less sudden explosion .

    Last but not least in order to be complete .
    Of course it would be nice if things were as simple as that but obviously the divergence can't go forever because the exponetial goes to infinity .
    And we know that chaotic systems having an attractor (like Lorenz one) must stay in a finite volume of the phase space because the attractor is always bounded .
    From there follows that when f(X0+dX0,t) - f(X0,t) begins to reach the size of the attractor , the trajectories don't diverge exponentially anymore . They are then so different and in so diferent places of the attractor that they have nothing in common anymore .
    That's why the Lyapounov coefficient can only be used and makes sense for some finite time interval [0,TL] and looses more and more its significance when one approaches TL .

  12. Of course there is a nonsense in the above that I could not edit.
    The time after which a numerical solution becomes an artefact can obviously NOT be greater than T because T is an upper bound .
    It can only be smaller and what one cannot know is by how much smaller it is because that depends on X0 (initial condition) .

    T can be easily computed .
    If one supposes that the solution is normalised so that it is a number between 0 and 1 , then if A is the accuracy of the hardware for [0,1] , and dt the step chosen , the maximum of steps you can compute is N = 1/(A.dt) so T = 1/A . Beyond this T surely , and possibly earlier , your solution becomes an artefact .

  13. Argh I am muddled today .
    The end should read N = 1/A so as T =N.dt , T=dt/A .
    Correct this time . And better for the dimensional consistency .

  14. I just meant that I'm actually going to add a periodic forcing term rather than "force" the system by having a time-dependent parameter value; so it's a different system, but that's already coded up because that's what I needed for verifying the numerical implementation with MMS.

  15. Tom said: Focusing a moment on the "simple" interpretation .
    A characteristics of a chaotic solution is , as I already wrote above , that :
    f(X0+dX0,t) - f(X0,t) = g(X0).exp(L.t)
    From there follows trivially that for times less than 1/L the trajectories don't diverge too wildly and regardless of the "time step" which is necessarily much smaller than 1/L , when the time step decreases you will get something that looks like "convergence" .

    I think you'll appreciate the introductory error analysis in the paper that goes with the site from reference [8] linked above.

    Tom also said: Normally what you should see if you plot differences between trajectories with different time steps what is equivalent to take the differences between different initial conditions , e.g f(X0+dX0,t) - f(X0,t)(this should be proven if one wants to be rigorous), is a horizontal 0 line untill approximately T and then a more or less sudden explosion .
    Yes, that's exactly what I found; see the plots in Figure 2 of this post, differences between trajectories in an initial condition ensemble.

  16. Yes interesting paper with many details .
    They find basically that :
    Increasing the
    number of significant digits beyond the 420 digits used in this work increases the
    interval on which the Lorenz solution is computable.

    what is an elaborated way to say what I wrote above in a fast 0 order approximation :

    There is a fine point of critics that I would make and that is that T is an upper bound and not a constant of the system .
    Depending on X0 , the numerical solution may become wrong (e.g an artefact) BEFORE T for a GIVEN hardware precision .
    The real T cannot be independent of X0 .
    When I played with Lorenz solvers at low precision and with the same dt , I noticed that the time at which the trajectories intersect (e.g T when the numerical solution becomes an artefact) depends on X0 .

  17. These neat little examples on parameter continuation were linked from the Maxima list in a completely unrelated context, but goes to the IVP / BVP distinction. My question to folks at Dr Curry's site about why GCM's aren't using all the standard BVP and convergence acceleration techniques (dropping the time-derivative term) if they are really solving a BVP has been met with limp-wristed hand-waving or silence.

    I guess they didn't know that Steven Schneider (certainly not on anyone's list of 'skeptics') thought climate was probably an IVP problem too:
    "A common view of the climate system and ecosystem structure and function is that of path independence (no memory of previous conditions). However, the multiple stable equilibria for both THC and for atmosphere-biosphere interactions in West Africa suggest a more complex reality. In such systems, the equilibrium state reached is dependent on the initial conditions of the system."
    Abrupt Non-Linear Climate Change, Irreversibility and Surprise

  18. Interesting sounding paper, from the Abstract: Nonlinear systems driven by noise and periodic forces with more than one frequency exhibit the phenomenon of Ghost Stochastic Resonance (GSR) found in a wide and disparate variety of fields ranging from biology to geophysics. The common novel feature is the emergence of a "ghost" frequency in the system's output which it is absent in the input. As reviewed here, the uncovering of this phenomenon helped to understand a range of problems, from the perception of pitch in complex sounds or visual stimuli, to the explanation of climate cycles. Recent theoretical efforts show that a simple mechanism with two ingredients are at work in all these observations. The first one is the linear interference between the periodic inputs and the second a nonlinear detection of the largest constructive interferences, involving a noisy threshold. These notes are dedicated to review the main aspects of this phenomenon, as well as its different manifestations described on a bewildering variety of systems ranging from neurons, semiconductor lasers, electronic circuits to models of glacial climate cycles.
    The Ghost of Stochastic Resonance: an Introductory Review