CHEM 240: Introduction to
Bioanalytical Chemistry

J. D. Cronk
Syllabus   Previous | Next   Pick a lecture:
 
4. header

Lecture 4. Statistical treatment of experimental data

Wednesday 25 January 2006

Statistical methods of data analysis. The Student's t statistical tool. The Q test for bad data. The least squares method for finding the best linear fit to a data set. Constructing a calibration curve.

Reading: Harris, Ch.4, p.74-85. Problems: Ch.4 - 5, 7, 9, 10, 14.

 

4. Summary

Lecture 4 Summary

Here we look at some statistical concepts and tools that relate to experimental measurements. For the most part, in this discussion we assume that there is no systematic error, and that we are dealing solely with the inevitable random error that accompanies any experimental measurement.

The Gaussian distribution

  • The results of many experimental measurements can be approximated by a Gaussian distribution
  • A Gaussian distribution is characterized by a mean m, and a standard deviation, s. (Variance = s2)
  • Correspondence between a histogram and Gaussian distribution

The Gaussian distribution is a mathematical function that describes a smooth, symmetric bell-shaped curve, the peak of which corresponds to the mean m. The value s is a parameter that is related to the width of bell-shaped curve: the larger s, the broader the distribution.

When we perform experiments, we often find that repeated measurements of the same event produce a range of values. Plotting our results as a histogram (called a bar graph in Harris - see Fig. 4-2 on p.72), we may note that it roughly resembles a Gaussian distribution. This correspondence will generally become closer as the number of measurements increases and the width of the bars of the histogram decrease relative to the full range of data (i.e. the number of bars increases). In the limit as the number of measurements approaches infinity, and the width of the bars become infinitesimally small, the correspondence between the histogram and the Gaussian distribution becomes exact.

The actual finite set of data points has a mean ("x-bar") and a standard deviation (s) that we can compute. We distinguish between these and the Gaussian parameters m and s by denoting the latter as the "true mean" and the "true standard deviation".

Note that there is a relation between precision, or reproducibility of a measurement, and the standard deviation of a set of repeated measurements. A large value of s reflects a measurement that is not very precise.

Student's t

  • Determining a confidence interval
  • Comparison of means

The Student's t is a statistical tool that we will use to calculate a confidence interval for a set of measurements, and to judge whether two different methods for measuring the same quantity both converge on the same "true" value, or if they are systematically different.

Q test for bad data

We will occasionally encounter cases in which one or more measurements in a data set just don't seem to fit: They are "outliers". Their values are so different from the rest of the measurements that we feel that something must have "gone wrong" and that therefore these data points are "bad". In order to avoid making totally subjective decisions on whether or not to include such data in the analysis and conclusions - which could easily become a slippery slope to bad science - we make use of the Q test. To do so, we calculate a range for our data - the difference between the highest and lowest values, and also calculate a gap - the difference between our suspected outlier and the next most extreme value - and then take the ratio of these values: gap / range, and call this ratio Qcalc. Then we compare the value of Q from a table for the total number of measurements in our data set, and ask a simple question: Is Qcalc > Qtable? If so, the data point is considered objectively "bad" and discarded. If not, then the data point should be retained.

footer

[ Back to top of page ] [ E-mail: cronk@gonzaga.edu ]