## Monday, July 20, 2015

### Astro-Stats & Python : Bayes' Theorem (for a Coin)

Bayes' Theorem allows me to find the probability of an event occurring based on a known condition related to the event.  For this statistical method, I must assume some prior expectation of the parameter(s) just like I have in some of my previous exercises.  However, unlike Levenberg-Marquardt and MCMC-Bootstrapping statistics that I have discussed and applied previously, I now do not need to hypothesize a model.  Bayes' Theorem will evaluate the probability of the parameter as I account for each datum in my data set.  This will be much more exciting and comprehensive with an example.

Shall we begin?

As always... follow along by copying my code from here and paste it to your own Python script. Or download my script from the same link by using the "wget" command in the Terminal.
>>>
>>>execfile('bayes_theorem_blog.py')
>>>

For my example, I used a coin with two sides: Heads and Tails.  As I flipped the coin 20 times, my data turned out to be the following:

This resulted in 13 Heads and 7 Tails.

I want to know the probability of getting Heads at the flip of my coin.  In other words, in order to find a satisfactory fit for the data I must find this best fitting parameter--in this case being the true probability of getting Heads.  Although most people would assume that such a probability is 0.50.  I am going to assume that some crook from Vegas gave me a phony coin that was manufactured in such a way that the coin will only be flipped on its Heads side 25% of the time.  Therefore, my first prior expectation for this parameter is 0.25.

For my example, I use a normal distribution as the PDF for my data.  I used this same distribution in some of my previous exercises (like this one); however, this time I will not include irrelevant terms such as the offset and the amplitude.  My distribution is described as the proportion

$G(x_i) \propto e^{ -\frac{1}{2} \big( \frac{C-x_i}{W} \big)^2 }$

where W is the full width at half-maximum of the distribution's curve, $$x_i$$ is one number within a sample of values near or equal to the prior parameter, and C is the centroid representing the parameter value where the peak of the PDF is located.  I use a sample of 100 numbers rather than just the prior expectation because it allows me to see the shape of the probability distribution clearly.
For this example, I choose W to be 0.05 merely for aesthetic plotting purposes.

At each coin flip (or data point or iteration or trial), I must evaluate G.  Because my initial prior probability is 0.25, the peak of my PDF starts at the x-axis value of 0.25.  Therefore, my initial prior  = C = 0.25.  Consequently, when $$x_i$$ eventually equates to 0.25 you will see the probability equal 1 in my PDF plot due to the G function.

After using the data to evaluate G, I must calculate the Likelihood in order to find the probability associated with my initial prior.  The Likelihood is found with the proportionality

$\mathcal{L}_{n}^{k}(C) \propto C^k(1-C)^{n-k}$

where n is the current iteration, C is the prior and k is the number of Heads that appeared from the beginning to the nth iteration.

Once I've found the Likelihood, I obtain a new prior (C).  My calculation of this probability is

$C_{posterior} \propto C_{prior}\mathcal{L}$

where $$\mathcal{L}$$ is the Likelihood.

As I use these 3 equations for each of the 20 iterations, you will see the centroid (C) of my Gaussian PDF start at 0.25 and shift towards a value that is more likely to be true as I conduct my calculations from the first datum (which was a Heads) to the last datum.

 Figure 1: Plot showing the PDF at every 5 iterations.  The peak of the blue line is aligned with 0.25.  At the last iteration, the centroid of the purple line indicates that the last evaluation of the parameter was 0.35.  Keep in mind that I chose X-axis values solely from 0 to 1 because I wanted a sample of numbers near my initial prior and because my parameter happens to be a probability.  Probabilities, of course, can only range from 0 to 1.
If I were to gather more data (i.e. flip more coins), you would see the PDF shift further towards 0.50.  In reality, I know that the probability of flipping an ideal coin and getting Heads is 0.50.  This is why the results in my plot are reasonable.

According to my data alone, the empirically found probability of getting Heads is 13/20 = 0.65.  If my initial prior was... let's say 0.40... then you would have seen the PDF shift past 0.5 and get closer to 0.65.