Saturday, July 18, 2015

Part II : Astro-Stats & Python : Lev-Marq to Markov Chain Monte Carlo and Bootstrapping

Now, that I have reviewed the concepts in the previous posts, I'll get straight to the fun stuff.

The objective of this exercise is essentially the same as my previous posts: determine the best fitting parameters and their confidence intervals as I fit to realistic data. 

The methodology behind this exercise is an accumulation of two of my previous exercises:

As always, I invite the reader to follow along on your own computer if you have Python installed.  But things have gotten a little more complex than my last examples.  You'll need to download TWO scripts now in order to witness the full power of my code.
You can copy my scripts from here (LM) and here (MCMC) then paste them into your own two scripts.  Or, you can use the "wget" command, in the Terminal, on these links to immediately download my files.  Make sure that both files are in the same folder!!!!!!

>>>execfile('mcmc_walk.py')
>>>walk_mcmc(x_vals = np.arange(20),   data_sigmas=2,  pars=np.array([10, 500, 2000],dtype=float),   pars_g = np.array([90, 200, 2], dtype = float),   par_sigmas=np.array([.00009, 0.00009, 0.0009], dtype=float),   trials = 2000,   sim_data_gauss = False,   plot=True)

>>># feel free to run this "walk_mcmc" command as many times as you want. Also, feel free to change things if you are familiar with Python.  In order to see ALL of the plots, there is a condition that must be met that I have not mentioned yet.  So, don't worry if you run the "walk_mcmc" function a couple times and have yet to see ALL of the plots.

In this example:
  • x-axis values : numbers 0-19
  • "data_sigmas" uncertainties of data points, i.e. measurements: 2 (for each data point)
  • true parameters : the numbers 10, 500 and 2000
  • initial guess at parameters : the numbers 90, 200 and 2
  • number of trials for MCMC : 2,000
  • "sim_data_gauss" simulated data set used : polynomial + noise  (Refer to 1st bullet point in Astro-Stats & Python : Recap for methodology of simulated data)
  • "par_sigmas" uncertainties of the 3 changing parameters in MCMC : 0.0009 . This is extremely small for reasons soon to be explained.

 Let's first take a look at how the data looks, which follows the polynomial \(y = a_0 + a_1 x + a_2 x^2\).

Figure 1: Plot of data (dots) and the data without noise (line). Y-axis is Measurement and X-axis is simply the x-values.  Because the uncertainty of each "measurement" is 2, the dots actually do deviate from the line.  That is not very discernible in this plot because I did not zoom in.
When you run the code, the LM statistical method in one of my functions will attempt to find the true parameters.  
Figure 2: My LM fitting function did a great job this time in find the best fitting parameters!
 The "acceptance rates" are the ratios of the number times a parameter was changed compared to the number of times the MCMC attempted to change the parameter.  From the "Random Walk" perspective, this is the ratio of the number of times that the person took a step to the number of times the person thought about taking a step.  In this case, there are 3 random walkers because of the 3 parameters. 
The criterion for seeing ALL of the plots is that every parameter's acceptance rate must be above 40%.  This is because a good metric for determining if the MCMC did well in ascertaining the confidence interval is if the "walker" accepted 40% of the steps that he/she considered.  There can be a variety of reasons why the MCMC may not do well.  The primary reason for doing badly is the user's choice of "par_sigmas."  If the acceptance rate is < 40% for a parameter, then the user should decrease the sigma for that parameter.  If the rate is ~100%, the sigma for that parameter should increase.  I am currently not sure what should be one's specific threshold for determining what qualifies as being near 100%.

Figure 3: Fitting of LM found parameters to original data.  The fit looks great.  Because we actually know the true values of the parameters, it was already apparent that I would have a good fit by the information in Figure 2.


I think that was enough to absorb for one blog post.  I'll continue next time with the rest of the results of this specific execution of my code.

to be continued...

No comments:

Post a Comment