Data Fitting with Least Squares minimization & Error Estimation

1. Introduction

A very important tool of research in the physical sciences is least-squares fitting of data, in order to estimate physical parameters of a model.

For example, in the current assignment you are asked to estimate parameters (lifetime) associated with radioactive isotopes.

There are a huge variety of applications of parameter fitting, but the general sequence of steps is the same:

1. Measure and record your data and estimates of the standard errors on each measurement. The measurements are associated with a physical system or process for which you have an analytic model (e.g., an equation to predict its behavior). This could be, for example:
2. Determine if you have enough data to constrain your set of parameters in your model. If you have N parameters, you need at least N+1 statistically independent measurements (data points) of the physical system to constrain your parameters adequately to fit them.

3. Apply a variational fitting technique which changes the parameters while determining some measure of the goodness of the model (when evaluated with these parameters values) compared to the data. Find the best set of parameters that describe your data via the analytic function (which represents your theory of the process).

4. Determine the standard  errors on your estimation of the parameters, and see if the data seems to fit the model, within the errors.

2. Unstable particle decay (review)

The spontaneous decay of unstable particles is governed by the Weak Interaction or Weak force. This force can even lead to decay (think of the force as "prying" a composite particle apart) of single subatomic particles in addition to radioactive isotopes as we have studied earlier. We will look at the decay of several particles that are subject to these instabilities: the muon (or mu-lepton) and the pion (or pi-meson) .

The muon is a heavier partner of the electron. It is very commonly produced in cosmic ray interactions, and is the main reason that a Geiger counter will "tick" at random even when there is no other radiation present. Muons rain down on us from above at an intensity of about 1 per square centimeter per minute. The muon at rest has an average lifetime of 2.2 microseconds and a mass of 105 times the electron mass, but when it is produced, it usually has very relativistic energies, several tens of billions of electron-volts [eV] on average. This gives it a much longer lifetime in flight than it has at rest, because of the time dilation due to special relativistic effects.

Muon decay produces an electron and muon antineutrino (in the case of the negative muon or mu-) or a positron and muon neutrino (for the positively charged mu+). One can measure the lifetime of these particles by counting the number of electrons or positrons emitted as a function of time after a cosmic ray muon has entered a cosmic-ray detector and stopped. It is then mometarily "at rest" in the detector. Here is a plot of one such measurement of this type of data (from an experiment at Westmont college ):


Figure 1:  Plot of the number of muon decays versus time for a typical muon lifetime counting experiment.

3. Fitting the data using  Chi-squared minimization

The cornerstone of almost all fitting is the Chi-squared method, which is based on the statistics of the Chi-squared function as defined:

where the Ni( ti ) are the individual measurements (for example, the number of counts in one of the time bins), and the
predicted value of the model is fi ( ti ; aMwhere the  aM  are the M parameters which are set to some reasonable trial value. The standard error of each measurement is the sigma_i in the denominator. There are a total of Nd measurements.

This function is an intuitively reasonable measure of how well the data fit a model: you just sum up the squares of the differences from the model's prediction to the actual data measured, divided by the variance of the data as determined by the standard measurement errors. If the measurements are all within 1 standard deviation of the model prediction, then Chi-squared takes a value roughly equal to the number of measurements. In general, if Chi-squared/Nd is of order 1.0, then the fit is reasonably good. Coversely,  if Chi-squared/Nd >> 1.0, then the fit is a poor one.

Figure 2: A schematic example of how Chi-squared gives a metric for the goodness of fit.

Figure 2 shows how this works in a simple example. Here the circles with error bars indicate hypothetical measurements, of which there are 8 total. Equation (1) above says that, to calculate Chi-squared, we should sum up the squares of the differences of the measured data and the model function (sometimes called the theory) divided by the standard error of the data. The division by the standard error can be thought of as a conversion of units: we are measuring the distance of the data from the model prediction in units of the standard error. The sum of the squares of these distances gives us the value for the Chi-squared function for the given model and data.

In Fig. 2, the red model, while it fits several of the data points quite well, fails to fit some of the data by a large margin, more than 6 times the standard error in some cases. This makes it very improbable that this model accurately describes the data, since it is very improbable that our system could have fluctuated statistically in such a way that we were more than 6 sigma away from the true value. On the other hand, the blue model, while not hitting any of the data points dead-on, does fit the overall data much better, as given by the fact that its Chi-squared value is much lower. This indicates to us that the two parameters of the blue model (slope and y-intercept for a linear model) are a much better estimate of the true underlying parameters of the physical system than for the red model.

The statistical properties of the  Chi-squared distribution are well-known, and the probability of the model's correctness can be extracted once this function is calculated. If the model has M free parameters, they can be varied over their allowed ranges until the most probable set of their values (given by the lowest Chi-squared value) is found.

One can treat the M free parameters as coordinates in an M-dimensional space. The value of Chi-squared at each point in this coordinate space then becomes a measure of the correctness of that set of parameter values to the measured data. By finding the place in the M-space where Chi-squared is lowest, we have found the place where the parameters and model most closely match the measured data. This will ideally occur at a global minimum (eg., the deepest valley) in this M-dimensional space. Of course there may be local minima that we might think are the best fits, and so we have to test these for the goodness of the fit before deciding if they are acceptable.

For example, using the line-models in Fig. 2 above, we have two parameters that we can vary, the slope and y-intercept of the line, so M=2 in this case, and we can simply step through a large number of both of these parameters to trace out the beahavior of Chi-squared as a function of these two dimensions.

There are many methods for finding the minimum of these M-parameter spaces. One of the more powerful is called Minuit.
A "brute force" approach is to systematically vary our position in the M-space, and to then calculate the value of Chi-squared at each location that we visit. We will look for the lowest value, and also use some physical intuition to ensure that we did not just find some "local" minimum, rather than a global one.

3.1 Minimizing Chi-squared with a grid search

As a graphical example of such a search in 2-dimensional space, have 2 parameters, is seen below. In general, if you fit M parameters, you will have an M-dimensional grid space, with Chi-squared determined at each point. (You couldn't make a plot of it anymore, rather you would have to do 2-dimensional slices through the M-parameter space).


Notice that the minimum in Chi-squared is about the right value for the fit to be good at the minimum. Notice also how the values of Chi-squared get very large (many thousands ) away from the minimum.

4. Determining the Goodness of fit

The goodness of fit is determined by estimating the probability that the value of your Chisquared minimum would occur if the experiment could be repeated a very large number of times with exactly the same setup--that is, with only the sampling statistics varying on each trial. If your value of Chi-squared falls within the 68.3% (1 sigma) percentile of all the trials, then it is a good fit. In practice we can't repeat the experiment, so we need some way to estimate the value of Chi-squared that corresponds to a given percentile level (this percentile is also called the "confidence level").  

To determine the confidence level of a given value of Chi-squared, we first need to estimate a quantity called the number of degrees of freedom, or ND . The first guess at this is that ND = number of data values = Nd. In practice, the fact that we are constrained to fit the two parameters reduces the degrees of freedom, so

ND = (number of data values) - (number of parameters to fit) = Nd - Np

Once we have ND, we can now construct another variable y which approximately transforms the Chi-squared distribution into a distribution with zero mean and unit standard deviation (just like a Gaussian distribution):

This transformation is fairly accurate for ND > 6, so it is reasonably good for our data in this example. What this means now is that, if you have a given Chi-squared value, after you calculate the tranformation, the resulting values will follow Gaussian (also known as normal) statistics, so any value of |y| < 1.0  is a reasonable fit, because it means that the value lies within one standard deviation of the mean. Values larger than this have a probability that follows the Gaussian probability, that is, a 3 sigma value 
(y = 3) would have only a 0.6% probability of being the correct fit.

5. Determining the standard errors on your parameters

Assuming that the shape of the Chi-squared "bowl" that you observe around your minimum Chi-squared is approximately paraboloidal in cross section close to the minimum. This is exactly true if all of your parameters are independent and if your measurement errors have a normal gaussian distribution.

In this case we can think of Chi-squared as a sum of ND =  Nd - Np  independent gaussian distributions (the Np  parameter fits constrain the distribution and reduce the amount that Chi-squared can vary). If the model we are fitting is on average always within 1 sigma of the curve, the Chi-squared value is going to equal (1* N).

This is not an exact derivation, but it is a heuristic motivation as to why we use the (Chisquared+1) contour to find the standard error in the parameter, and also why the 2 and 3-sigma contours correspond to (Chisquared+4) and (Chisquared+9)--this is due to the parabolic shape around the minimum.