Wednesday, November 25, 2009

Converging and Diverging Views

I was brushing up on my maximum entropy and probability theory the other day and came across a great passage in Jaynes' book about convergence and divergence of views. He applies basic Bayesian probability theory to the concept of changing public opinion in the face of new data, especially the effect prior states of knowledge (prior probabilities) can have on the dynamics. The initial portion of section 5.3 is reproduced below.

5.3 Converging and diverging views (pp. 126 – 129)

Suppose that two people, Mr A and Mr B have differing views (due to their differing prior information) about some issue, say the truth or falsity of some controversial proposition S. Now we give them both a number of new pieces of information or ’data’, D1,D2,,Dn, some favorable to S, some unfavorable. As n increases, the totality of their information comes to be more nearly the same, therefore we might expect that their opinions about S will converge toward a common agreement. Indeed, some authors consider this so obvious that they see no need to demonstrate it explicitly, while Howson and Urbach (1989, p. 290) claim to have demonstrated it.

Nevertheless, let us see for ourselves whether probability theory can reproduce such phenomena. Denote the prior information by IA, IB, respectively, and let Mr A be initially a believer, Mr B be a doubter:



P (S|IA) ≃ 1, P(S |IB ) ≃ 0
(5.16)

after receiving data D, their posterior probabilities are changed to



 P (D |SIA) P(S |D IA) = P (S|IA)---------- P (D |IA )
(5.17)




 P-(D-|SIB-) P(S |D IB) = P (S|IB) P (D |IB )
(5.17)

If D supports S, then since Mr A already considers S almost certainly true, we have P(D|S IA), and so



P (S |D IA) ≃ P (S |IA)
(5.18)

Data D have no appreciable effect on Mr A’s opinion. But now one would think that if Mr B reasons soundly, he must recognize that P(D|S IB) > P(D|IB), and thus



P (S |D I ) > P (S |I ) B B
(5.19)

Mr B’s opinion should be changed in the direction of Mr A’s. Likewise, if D had tended to refute
S, one would expect that Mr B’s opinions are little changed by it, whereas Mr A’s will move in the direction of Mr B’s. From this we might conjecture that, whatever the new information D, it should tend to bring different people into closer agreement with each other, in the sense that



|P (S|D I ) - P (S |D I )| < |P (S|I ) - P (S|I )| A B A B
(5.20)

Although this can be verified in special cases, it is not true in general.

Is there some other measure of ‘closeness of agreement’ such as log[P(S|D Ia)∕P(S|D IB], for which this converging of opinions can be proved as a general theorem? Not even this is possible; the failure of probability theory to give this expected result tells us that convergence of views is not a general phenomenon. For robots and humans who reason according to the consistency desiderata of Chapter 1, something more subtle and sophisticated is at work.

Indeed, in practice we find that this convergence of opinions usually happens for small children; for adults it happens sometimes but not always. For example, new experimental evidence does cause scientists to come into closer agreement with each other about the explanation of a phenomenon.

Then it might be thought (and for some it is an article of faith in democracy) that open discussion of public issues would tend to bring about a general consensus on them. On the contrary, we observe repeatedly that when some controversial issue has been discussed vigorously for a few years, society becomes polarized into opposite extreme camps; it is almost impossible to find anyone who retains a moderate view. The Dreyfus affair in France which tore the nation apart for 20 years, is one of the most thoroughly documented examples of this (Bredin, 1986). Today, such issues as nuclear power, abortion, criminal justice, etc., are following the same course. New information given simultaneously to different people may cause a convergence of views; but it may equally well cause a divergence.

This divergence phenomenon is observed also in relatively well-controlled psychological experiments. Some have concluded that people reason in a basically irrational way; prejudices seem to be strengthened by new information which ought to have the opposite effect. Kahneman and Tversky (1972) draw the opposite conclusion from such psychological tests, and consider them an argument against Bayesian methods.

But now in view of the above ESP example, we wonder whether probability theory might also account for this divergence and indicate that people may be, after all, thinking in a reasonably rational, Bayesian way (i.e. in a way consistent with their prior information and prior beliefs). The key to the ESP example is that our new information was not

S fully adequate precautions against error or deception were taken, and Mrs Stewart did in fact deliver that phenomenal performance.

It was that some ESP researcher has claimed that S is true. But if our prior probability for S is lower than our prior probability that we are being deceived, hearing this claim has the opposite effect on our state of belief from what the claimant intended.

The same is true in science and politics; the new information a scientist gets is not that an experiment did in fact yield this result, with adequate protection against error. It is that some colleague has claimed that it did. The information we get from TV evening news is not that a certain event actually happened in a certain way; it is that some news reporter claimed that it did.

Scientists can reach agreement quickly because we trust our experimental colleagues to have high standards of intellectual honesty and sharp perception to detect possible sources of error. And this belief is justified because, after all, hundreds of new experiments are reported every month, but only about once in a decade is an experiment reported that turns out later to have been wrong. So our prior probability for deception is very low; like trusting children, we believe what experimentalists tell us.

In politics, we have a very different situation. Not only do we doubt a politician’s promises, few people believe that news reporters deal truthfully and objectively with economic, social, or political topics. We are convinced that virtually all news reporting is selective and distorted, designed not to report the facts, but to indoctrinate us in the reporter’s socio-political views. And this belief is justified abundantly by the internal evidence in the reporter’s own product – every choice of words and inflection of voice shifting the bias invariably in the same direction.

Not only in political speeches and news reporting, but wherever we seek for information on political matters, we run up against this same obstacle; we cannot trust anyone to tell us the truth, because we perceive that everyone who wants to talk about it is motivated either by self-interest or by ideology. In political matters, whatever the source of information, our prior probability for deception is always very high. However, it is not obvious whether this alone can prevent us from coming to agreement.

Jaynes, E.T., Probability Theory: The Logic of Science (Vol 1), Cambridge University Press, 2003.

12 comments:

  1. Jaynes seems almost prescient:
    To make matters worse, some scientists, and still more people among environmental and other organizations, made statements not supported by what was reliably known. An example was implicit or explicit claims that hurricanes were increasing as a result of human interference with the climate. There was no way for the general public to know whether scientists actually made such claims, still less whether the claims were made honestly or disingenuously.
    -- A Historian Looks 'Back' at the Climate Fight

    ReplyDelete
  2. Fascinating stuff. Especially if your starting point "Suppose that two people, Mr A and Mr B have differing views (due to their differing prior information) about some issue"

    has (due to their predisposition to believe it) is actually the baseline i.e. it's not necessarily a logical choice from the off.

    I recently encountered this phenomenon in practice on a jury.

    ReplyDelete
  3. This paper, Risk-based decision analysis in support of precautionary policies [pdf], talks a bit about decision analysis when there are multiple stake-holders that bring differing values and prior information to the table:
    This is not to say that different parties must agree on the appropriate course of action, however. Full, rational consideration may show that a decision that appears to have the appropriate amount of 'prudent precaution' for its proponents appears reckless and non-precautionary to opponents with different beliefs and value systems.

    Their discussion of the precautionary principle is interesting. My take is that it is basically a way to avoid rational decision making (because that process is 'strained' when catastrophic things are quite uncertain), it then supports irrationally avoiding action whose avoidance would not be warranted by an honest consideration of our state of knowledge (subjective degree of belief).

    Precautionary Principle: The triumph of fear over reason.

    ReplyDelete
  4. Also from that Risk-based decision analysis paper:
    Focusing solely on a single feared outcome, believing only that it would be terrible or that there is some non-zero chance that it will occur, is not a sufficient basis for taking precautionary action.

    ReplyDelete
  5. On (Bayesian) rationality for individuals,
    We give a weak system of consistency axioms for "rational" behavior. The axioms do not even assume the existence of an ordering for axioms. The conclusions are still that utility functions exist, both unconditionally and conditionally given the state of nature; the unconditional utility is a weighted linear combination of the conditional utilities; and the separation of the weights from the conditional scales is not necessary and even the possibility is questioned.
    A weak system of axioms for “rational” behavior and the nonseparability of utility from prior

    And a bit more on the extension to groups,
    An outstanding challenge for "Bayesian" decision theory is to extend its norms of rationality from individuals to groups. Specifically, can the beliefs and values of several Bayesian decision makers be amalgamated into a single Bayesian profile that respects their common preferences over options? If rational parties to a negotiation can agree on collective actions merely by considering mutual gains, is it not possible to find a consensus Bayes model for their choices? In other words, can their shared strict preferences over acts be reproduced with a Bayesian rationale (maximizing ex- pected utility) from beliefs (probabilities) and desires (utilities) that signify a rational compromise between their rival positions?
    Shared preferences of two Bayesian decision makers

    ReplyDelete
  6. Isn't it still irrational (i.e., ignoring the result of a Bayesian calculation) to assign a higher probability to "the claim is false and my opponents claim it to be true despite it because that furthers their agenda" than to "the claim is true and my opponents claim it to be true either because it is true or because they would have done so in any case". The fact that my opponent whom I believe to be deceitful states a factual claim that supports his position should never lower my probability for this claim, since the probability that he will state the claim is still slightly higher if it is true than if it is false. For example, if it is false there is an incentive for my opponent to just not mention it instead of perpetuating it, because being found out as a fraud can have negative impact on one's cause.

    ReplyDelete
  7. Ah, just reread my comment. I meant the probabilities of "given the claim is true, my opponent would have claimed it to be true " vs. "given the claim is false, my opponent would still have claimed it to be true". Of these two, the first one should have higher probability, which should shift my belief the appropriate (possibly infinitesimal) amount towards agreement.

    ReplyDelete
  8. Seems like a markov chain with differing starting points.

    ReplyDelete
    Replies
    1. Thanks for commenting. I agree: where you start matters in a couple of ways. The priors for the claim itself matter, and those are likely to be correlated with priors that effect how you evaluate incoming evidence. The evidence is hardly ever just "the evidence". It's usually another claim, and the who/what/where/how/why of the claim matter.

      Delete
    2. Thanks for your comments on this post. Definitely provoked some more thought on my part.

      The similarity with converging Markov Chains, and sensitivity to initial conditions is a pretty good analogy. It's not perfect though. The reason you get different answers from two Markov Chains you started with different initializations is that you haven't run them long enough. So the difference is down to error.

      Jaynes' point is subtly different: disagreement can legitimately exist without any error by either party.

      Delete