For example, in the current assignment you are asked to estimate parameters (lifetime) associated with radioactive isotopes.

There are a huge variety of applications of parameter fitting, but the general sequence of steps is the same:

1. Measure and record your data and estimates of the standard errors on each measurement. The measurements are associated with a physical system or process for which you have an analytic model (e.g., an equation to predict its behavior). This could be, for example:

- prices in the stock market (used to test your stock market price predictor model which will eventually make you rich)
- time series of the amplitudes of the wind-driven or
earthquake-driven motion of a bridge or other large structure for which
you have a mechanical model (so you can fit a resonant frequency and
degree of damping)

- the time series of positions of a satellite compared to its predicted orbit, so you could better determine its orbital parameters

2. Determine if you have enough data to constrain your set
of
parameters in your model. If you have N parameters, you need at least
N+1 statistically independent
measurements (data points) of the physical system to
constrain your parameters adequately to fit them.

3. Apply a variational fitting technique which changes the parameters while determining some measure of the goodness of the model (when evaluated with these parameters values) compared to the data. Find the best set of parameters that describe your data via the analytic function (which represents your theory of the process).

4. Determine the standard errors on your estimation of the parameters, and see if the data seems to fit the model, within the errors.

The muon is a heavier partner of the electron. It is very commonly produced in cosmic ray interactions, and is the main reason that a Geiger counter will "tick" at random even when there is no other radiation present. Muons rain down on us from above at an intensity of about 1 per square centimeter per minute. The muon at rest has an average lifetime of 2.2 microseconds and a mass of 105 times the electron mass, but when it is produced, it usually has very relativistic energies, several tens of billions of electron-volts [eV] on average. This gives it a much longer lifetime in flight than it has at rest, because of the time dilation due to special relativistic effects.

Muon decay produces an electron and muon antineutrino (in the case of the negative muon or mu-) or a positron and muon neutrino (for the positively charged mu+). One can measure the lifetime of these particles by counting the number of electrons or positrons emitted as a function of time after a cosmic ray muon has entered a cosmic-ray detector and stopped. It is then mometarily "at rest" in the detector. Here is a plot of one such measurement of this type of data (from an experiment at Westmont college ):

Figure 1: Plot of the number of muon decays versus time for a typical muon lifetime counting experiment.

predicted value of the model is f

This function is an intuitively reasonable measure of how well the data fit a model: you just sum up the squares of the differences from the model's prediction to the actual data measured, divided by the variance of the data as determined by the standard measurement errors. If the measurements are all within 1 standard deviation of the model prediction, then Chi-squared takes a value roughly equal to the number of measurements. In general, if Chi-squared/N

Figure
2: A schematic example of how Chi-squared gives a metric for the
goodness of fit.

Figure 2 shows how this works in a simple example. Here the circles with error bars indicate hypothetical measurements, of which there are 8 total. Equation (1) above says that, to calculate Chi-squared, we should sum up the squares of the differences of the measured data and the model function (sometimes called the theory) divided by the standard error of the data. The division by the standard error can be thought of as a conversion of units: we are measuring the distance of the data from the model prediction in units of the standard error. The sum of the squares of these distances gives us the value for the Chi-squared function for the given model and data.

In Fig. 2, the red model, while it fits several of the data points quite well, fails to fit some of the data by a large margin, more than 6 times the standard error in some cases. This makes it very improbable that this model accurately describes the data, since it is very improbable that our system could have fluctuated statistically in such a way that we were more than 6 sigma away from the true value. On the other hand, the blue model, while not hitting any of the data points dead-on, does fit the overall data much better, as given by the fact that its Chi-squared value is much lower. This indicates to us that the two parameters of the blue model (slope and y-intercept for a linear model) are a much better estimate of the true underlying parameters of the physical system than for the red model.

The statistical properties of the Chi-squared distribution are well-known, and the probability of the model's correctness can be extracted once this function is calculated. If the model has M free parameters, they can be varied over their allowed ranges until the most probable set of their values (given by the lowest Chi-squared value) is found.

One can treat the M free parameters as coordinates in an M-dimensional space. The value of Chi-squared at each point in this coordinate space then becomes a measure of the correctness of that set of parameter values to the measured data. By finding the place in the M-space where Chi-squared is lowest, we have found the place where the parameters and model most closely match the measured data. This will ideally occur at a global minimum (eg., the deepest valley) in this M-dimensional space. Of course there may be local minima that we might think are the best fits, and so we have to test these for the goodness of the fit before deciding if they are acceptable.

For example, using the line-models in Fig. 2 above, we have two parameters that we can vary, the slope and y-intercept of the line, so M=2 in this case, and we can simply step through a large number of both of these parameters to trace out the beahavior of Chi-squared as a function of these two dimensions.

There are many methods for finding the minimum of these M-parameter spaces. One of the more powerful is called Minuit.

A "brute force" approach is to systematically vary our position in the M-space, and to then calculate the value of Chi-squared at each location that we visit. We will look for the lowest value, and also use some physical intuition to ensure that we did not just find some "local" minimum, rather than a global one.

Notice that the minimum in Chi-squared is about the right value for the fit to be good at the minimum. Notice also how the values of Chi-squared get very large (many thousands ) away from the minimum.

To determine the confidence level of a given value of Chi-squared, we first need to estimate a quantity called the number of degrees of freedom, or

(

In this case we can think of Chi-squared as a sum of

This is not an exact derivation, but it is a heuristic motivation as to why we use the (Chisquared+1) contour to find the standard error in the parameter, and also why the 2 and 3-sigma contours correspond to (Chisquared+4) and (Chisquared+9)--this is due to the parabolic shape around the minimum.