Saturday, June 27, 2015

Exercise in Likelihood and \(\chi^2\)

At the Banneker Institute, the other interns and I are being taught Astro-Statistics.  Statistics is an essential part of astrophysics research.  During my first research project at the Center for Astrophysics, I was amazed at how much statistics was necessary to understand the instrumentation of the telescope and the Confidence and Precision when fitting a model to collected data.  Moreover, I felt somewhat badly about how little statistics I knew at the time.  However, my advisors were able to explain to me what the equations were and what I needed to know at the time to get the job done by the end of the summer.
I say all of that just to say that, I am very happy that I am learning statistics this summer.  More importantly, I am learning how to specifically apply it to astrophysics research.  Here is what I have done and learned:

First of all, if you want to follow along you can look at my code on GitHub here.
(I didn't do well at commenting things.  It's on my To Do List.  And keep in mind that this code is not polished up at all.)

Objective.  

The idea was to find a good fitting model to our data points.  The data points had the desired signal and random noise associated with it. So, we had to use likelihood and thus the \( chi^2\) function to find out what kind of line fitted our data best, which insinuated how the data would look if there was no random noise. 

Phase I.

 (No FOR loops allowed)

 The first challenge was to create a Python function that could deduce the highest degree of the polynomial equation (linear, quadratic, cubic, quartic, etc.) necessary to calculate the "y" values when only being given the coefficients and the "x" values. For example, if I give my Python function, called "poly" the input a_values = [2, 5]  then the function is suppose to know that the equation necessary will be of the form \( a_0 + a_1*x_{values}\)
and thus the equation should be \( 2 + 5\times x_{values} \).  Therefore, if the "x" values were x_values = [ 1, 2, 3] then poly should give me the "y" values 7, 12 and 17.

However, if the user inputs a_values = [ 2, 5, 1, 150], "poly" should know to use the formula
 \[ 2 + 5\times x_{values} + 1\times x_{values}^2 + 150 \times x_{values}^3 \]
to get the "y" values.
Furthermore, once "poly" finds the "y" values it is suppose to add noise to the data as a way of simulating real data that has been measured with uncertainties (sigma). This is why my "poly" function within my Python script essentially has two equations, e.g.
\[ data = 2 + 5\times x_{values} + 1\times x_{values}^2 + 150 \times x_{values}^3  + \sigma\times random_{noise}\]
\[ model = 2 + 5\times x_{values} + 1\times x_{values}^2 + 150 \times x_{values}^3 \]
Because no FOR loops were allowed, matrices were used to complete this task.  (Technically, in Python they are numpy arrays that acted like matrices, but there was no matrix multiplication techniques employed.)

Feel free to run my code as follows:
>>>import numpy as np
>>>execfile('stat_ex')
>>>poly1st(x_vals, a_values = np.array([2, 5, 1, 150]) , plot=True, sigma = 0.5)

You can change the "x_vals", "a_values", and "sigma" parameters however you want.

Because the noise is generated by random numbers (of a Gaussian distribution), the plot you see will change every time you run the function "poly1st."  The line represents how the data (points) would look without the noise. 

to be continued...

No comments:

Post a Comment