Monday, July 13, 2015

Astro-Stats & Python : Bootstrapping, Monte Carlo and a Histogram

As always, feel free to follow along if you have Python installed on your computer.  You can copy my code from here and paste it to your own script or you can use the "wget" command in the Terminal for that same web page. 

For this exercise, the objective is the same as the other exercises I have shared: determine the best fitting parameters no matter how many parameters are given.  This time I have completely changed how I deduce the optimal combination of parameter-values that would yield the minimal \(\chi^2\), i.e. the highest Likelihood, for my statistics.  Last time, I used nested FOR Loops to determine the Likelihood for every pair of parameter-values that I proposed as guesses (and the sample of guesses for each parameter were near the correct value of the respective parameter).  Now, I utilize no Loops and no initial guesses in my calculations to determine the optimal parameter-values.  This was all thanks to the matrix multiplication mathematics that I used.  

For now, I'll refrain from explaining the tedious matrix calculations that went into this work.  But I will mention that computer programmers have a tendency to avoid FOR Loops and find an alternative solution to have their code get the job done.  If you do not know what a FOR Loop is do not fear.  It's definition is not significant to the concept of the Astro-Statistical exercise that I am discussing in this post.  Just know that FOR Loops in the Python coding language typically makes the computer take a longer time to complete tasks than when using matrices or arrays to perform the same job.  However, FOR Loops tend to be a more intuitive approach for writing complex code. 

Well... let's get to the fun, shall we?

>>>execfile('bootstrap_blog.py')
>>> histog(A_trials_mat, figs, binning=50, plot=True)
>>># then, if you would like ... REPEAT both commands and you'll get different data, results, and plots


As with the other scripts I have shared, my code receives parameters, finds the corresponding polynomial equation, then adds random noise to that equation in order to simulate realistic data that was measured in nature.  The random number/noise generator in my code follows a normal/Gaussian distribution.  This is the primary reason why I use \(\chi^2\) minimization in the first place.  If you glance back at my Exercise in Likelihood and \(\chi^2\), continued... post, take note that the probability function explained there is for finding probabilities within information (in this case, the noise) that is of a normal distribution.  That probability function is how I derived \(\chi^2\) and Likelihood.

 Thanks to matrix multiplication, I found the combination of parameter-values that yield the least \(\chi^2\).  I perform this action 1,000 times.  However, for each of the 1,000 trials I add different random noise to my initial simulated data and then find another combination of parameter-values.  For this example, 3 parameters were used \(a_0\), \(a_1\) and \(a_2\).  (In this example, the true values of the parameters were 1, 0 and 2 respectively.)  By the end, I will have 1,000 combinations of values for the parameters. 

The idea of using random numbers to generate noise (in each trial) exhibits the Monte Carlo aspect of this exercise.  The histogram will illustrate how the Bootstrap concept is applied in this exercise.
Figure 1: Histogram illustrating the value of \(a_0\) that was deduced and the number of trials that that value reappeared.  The x-axis is a sample of 1,000 \(a_0\) values that were deduced.

According to Figure 1, the value that re-emerges (for yielding the least \(\chi^2\)) the most is a number near 1.20.  Using a histogram to illustrate this result obscures the precise numbers that were evaluated as yielding the least \(\chi^2\).  Nonetheless, the histogram perspective is a very useful graphic for showing the results because, actually, there were a plethora of distinct \(a_0\) values found throughout the 1,000 trials.  Thus, the histogram actually gives me an estimate of which numbers were repeated.  For example, Figure 1 says that values of \(a_0\) between 1.10 and 1.20 were found in ~65 trials.  This idea that a number/outcome is allowed to reappear throughout a consecutive succession of trials alludes to Bootstrapping statistics.  In other words, an element within a sample of elements may be selected more than once.  Furthermore, I am employing the concept of random sampling with replacement under Bootstrapping statistical methods.

From then on, I followed the same methodology I described in my Grad level Intro to Astro-Stats for Undergraduates, cont. post.  The y-axis values are added, so as to obtain the integral of the probability distribution function that is implicitly demonstrated by the histogram's data.

I have yet to show the rest of my results for all 3 parameters.  And I understand that it must have been just eating you up inside to be kept in such suspense while reading the previous two paragraphs.  So, for the satisfaction of your curiosity, I will show the rest of my results.  However, remember that the results change every time the code is ran, so it is highly unlikely that you would end up with the exact same data, plots, and results.

Figure 2: The optimal value of \(a_0\) was determined to be 1.11 with relatively large uncertainties.  Because the true value is 1.0, it would have been nice to have uncertainties near 0.15.  Despite that, I'm content with this.

Figure 3: Histogram for resultant \(a_1\) values evaluated via Monte Carlo and Bootstrapping Statistics.

Figure 4: The optimal value of \(a_1\) was determined to be -0.069.  Because the true value is 0.0, I'm excited to see the right red error bar encompass 0.0 (on the x-axis).

Figure 5: Histogram for resultant \(a_2\) values evaluated via Monte Carlo and Bootstrapping Statistics.

Figure 6:  Okay!  this is just downright beautiful!  The optimal value of \(a_2\) was determined to be 1.985.  Because the true value is 2.0, I am pretty darn happy to see those super-small uncertainties!

No comments:

Post a Comment