To: All Msg #217, Apr1393 12:45PM Subject: Bayes
From: Simon Clippingdale
To: All Msg #217, Apr1393 12:45PM
Subject: Bayesian Statistics, theism and atheism
Organization: Department of Computer Science, Warwick University, England
From: simon@dcs.warwick.ac.uk (Simon Clippingdale)
MessageID: <1993Apr13.204545.8031@dcs.warwick.ac.uk>
Newsgroups: alt.atheism
This is a cutdown version (believe it or not) of part of a file I'm in the
process of writing.
Since I don't want this distributed before it's been through the process of
causing outrage on the net and being suitably modified (this isn't even an
alpha version), here comes a copyright notice.
**** Contents copyright 1993 Simon Clippingdale, so there. ****
Apologies for the length and the occasional reference to stuff unrelated
to the current thread, but this came out of an email discussion and I
haven't had time to make it totally standalone or edit all the flab.
This part examines the question "Can a lack of evidence for something
be considered as evidence against it?" using Bayesian statistics.
The general framework involves the updating of notional running estimates
of probability for each of a number of hypotheses H[i], as new observations
x[n] arrive at times n.
The hypotheses are assumed to be Ha (atheism, correct if no gods exist) and
Ht (theism, correct if one or more gods exists). Since the hypotheses form
a partition (gods either exist or they don't), then P(theism) + P(atheism) = 1.
The conditional probability P(X  Y) is read "probability of X given Y" and
is equal to
P(XY) Prob. of X and Y
P(X  Y) =  = 
P(Y) Prob. of Y
Enough preliminaries; here goes. Cut to the file.
************************* begin included material *****************************
The question is what happens when `running' estimates of probability are
updated upon the arrival of a new observation. If the hypothesis in question
is H, prior observations are denoted by x[1],...,x[n1] and the new observation
by x[n], then the relevant statement of Bayes' Rule in this case expresses the
updated estimate in terms of the previous estimate as:
P(x[n]  Hx[1]...x[n1])
P(H  x[1]...x[n]) = P(H  x[1]...x[n1]) 
P(x[n]  x[1]...x[n1])
or
Prob. of new obs.
Prob. of H Prob. of H given H and prior obs.
given prior = given prior 
and new obs. obs. Prob. of new obs.
given prior obs.
The fraction on the right multiplies the old estimate to give the new one.
The denominator of that fraction is independent of H, so we need worry only
about the numerator (nevertheless, I'll leave the denominator in for clarity).
Assuming statistical independence of the observations x[k], 1 <= k <= n,
[i.e. P(x[k]x[m]) = P(x[k])P(x[m]), hence knowing past observations does
not help in predicting future observations], both the denominator and
numerator may be simplified:
P(x[n]  Hx[1]...x[n1]) = P(x[n]  H), and
P(x[n]  x[1]...x[n1]) = P(x[n]).
So the updating term which multiplies the estimate at `time' n1 to give the
estimate at `time' n simplifies to
P(x[n]  H)
 .
P(x[n])
We need to look at P(x[n]  H) for various observations x[n] and hypotheses H.
Recall that the denominator P(x[n]) is independent of H.
Now we are back to the business of what I've called `event spaces', which
are discrete or continuous spaces of all possible observations x[.], upon
which the various hypotheses each define some conditional probability
density function (pdf) f(x  H). I'll only deal with the general case of
continuous x; the discrete case simply involves Dirac delta functions at
the permissible observation values for each hypothesis.
The important assumption is that there are *some* observations which are
compatible with the theist hypothesis and not with the atheist hypothesis,
and thus would falsify atheism; these are what I called `appearances of god/s',
but this need not be taken too literally. Any observation which requires for
its explanation that one or more gods exist will count. All other observations
are assumed to be compatible with both hypotheses. This leaves theism as
unfalsifiable, and atheism as falsifiable in a single observation only by
such `appearances of god/s'.
What follows is a schematic representation of the conditional pdf's
corresponding to the theist hypothesis Ht and to the atheist hypothesis Ha.
The exact shape isn't important, and neither is the extent (I'm actually
going to represent f(x  Ht) as nonzero only on a finite interval, even though
unfalsifiablility implies that it is nowhere zero, just because it's easier
to draw. Extending its range to infinity doesn't affect the result).
Here goes (Ia, It are defined below):
f(x  Ha)
1/Ia ________________________________________________
 
1/It +
 f(x  Ht) /////////
 // area /
 // A /
 /////////
0  > x
 
appearances >   <
of god/s  
Also define two intervals on the event space as Ia, the space of all x
compatible with atheism: Ia = {x : f(x  Ha) > 0} and similarly for It:
 Ia
 It
Note that f(x  Ha) is larger on Ia than is f(x  Ht); this is because of
the normalisation condition
inf
Integral f(x) dx = 1
x = inf
for any pdf f(x), conditionals included. For densities other than uniform,
f(x  Ha) may dip below f(x  Ht) but the area under it on Ia is still
larger, and that is the important point.
The implication is that the theist hypothesis Ht `wastes' some proportion
(the area A in the schematic) of its available probability f(x  Ht) on
appearances of god/s, and this is finally its undoing in the absence of
such appearances, no matter how small is the area A provided that it is
nonzero. (If A = 0, the theism says that everything will always appear
exactly as if no gods existed, and is indistinguishable from atheism other
than as a thought experiment.)
P(x[n]  H)
And so back to the updating multiplier  .
P(x[n])
Numerator and denominator are both asymptotically zero, so we have to
consider a small interval around the observed value x0 of x[n], and let
this interval tend to zero:
P(x0 <= x[n] < x0 + dx  H) = f(x0  H) dx
P(x0 <= x[n] < x0 + dx) = f(x0) dx
which gives the multiplier as
P(x[n]  H) f(x[n]  H)
 = 
P(x[n]) f(x[n])
and in the case illustrated in the schematic above, we have
f(x[n]  Ha)
multiplier for Ha = 
f(x[n])
and
f(x[n]  Ht)
multiplier for Ht = 
f(x[n])
f(x[n]  Ha)
=  (1  A)
f(x[n])
for an observation x[n] on the interval Ia.
Thus for an observation on Ia, compatible with both theism and atheism, the
multiplier for Ht is smaller than that for Ha by a factor of 1  A, where A
is the area [or probability, integral of f(x  Ht) dx] which Ht `wastes' on
making possible the appearance of god/s.
After a large number N of observations all of which fall on the interval Ia,
the estimate of conditional probability for the theistic hypothesis will be
down on that of the atheistic hypothesis by a factor of (1  A)^N.
As N becomes arbitrarily large, with all observations on Ia, and no observed
`appearances of god/s' in [It & (!Ia)], the running estimates asymptotically
approach zero in the case of the theist hypothesis Ht and unity in the case
of the atheist hypothesis Ha.
And there you have it.
Summary: if theism states that god/s *may* `appear' or, more generally,
give rise to observations incompatible with atheism, then observations
which are compatible with both theism and atheism must tend statistically
to support atheism. This means that a lack of evidence which specifically
supports theism *is* evidence for atheism, because every observation
compatible with both theism and atheism causes running estimates of the
probability [of correctness] of atheism to increase and those of theism
to decrease.
( Aside: as to the initial estimates of P(Ha) and P(Ht), before any
( observations are in, we are compelled by the socalled Principle of
( Insufficient Reason to set both equal to 0.5. This is because to do
( otherwise implies that we have some information on the strength of
( which we can discriminate between the hypotheses. In the absence of
( any such information, an arbitrary relabelling of the hypotheses
( cannot lead to a change in the probabilities assigned, and thus they
( are bound to be equal.
(
( In the case shown in the schematic, the updating multipliers are
( k for Ha and k(1  A) for Ht, where k is a normalising constant,
( the value of which changes with observation number N:
(
( 1 + (1  A)^N
( k = 
( 1 + (1  A)^(N+1)
(
( giving the multiplier for Ha as
(
( 1 + (1  A)^N
( 
( 1 + (1  A)^(N+1)
(
( and the multiplier for Ht as
(
( 1 + (1  A)^N
( (1  A)  .
( 1 + (1  A)^(N+1)
(
(
( The first few values of the running estimates in this case are:
(
( obs.# for Ha: for Ht:
(
( 0 1 / 2 1 / 2
( 1 1 / (2A) (1A) / (2A)
( 2 1 / (22A+A^2) (12A+A^2) / (22A+A^2)
(
( ... ... ...
(
( N 1 / [1+(1A)^N] (1A)^N / [1+(1A)^N]
************************ end included material ********************************
Cheers
Simon

Simon Clippingdale simon@dcs.warwick.ac.uk
Department of Computer Science Tel (+44) 203 523296
University of Warwick FAX (+44) 203 525714
Coventry CV4 7AL, U.K.
EMail Fredric L. Rice / The Skeptic Tank
