Module 2:
Inferential Parametric Statistics
and Sampling Distributions
The student will:
1. understand characteristics of the normal distribution
2. understand probability using the normal distribution
3. understand theories that support the researcher being able to make inferences about population parameters from statistics obtained from a single sample (central limit theory and sampling distribution theory)
4. understand how the standard error of the sampling distribution is used in making inferences about population parameters from sample statistics
5. understand how probability is used with the sampling distribution
in making inferences about population parameters from sample statistics
(click here to return to content listing)
Required: Review Basic Statistics Booklet from DPLS 720; SPSS text Chapters
5, 6, and 10.
Recommended (optional): Hinkle et al. text: Chapters 4 and 7
(click here to return to content listing)
Probability and Normal Distributions
Making Inferences About Population Parameters From Sample Statistics
The Standard Error: An Estimate of the Sampling Distribution
Estimating the Population Mean from the Sample Mean
An Example of How Population Parameter Mean
can be Estimated from a Sample Mean
Probability and Normal Distributions
The normal distribution is a mathematical idealization of a particular type of distribution and plays a critical role in parametric inferential statistics. Properties of a normal distribution include:
1. unimodal
2. symmetrical
3. the mean, median, and mode are equal
There is a family of normal distributions (z, t, and F distributions will be used in this course). These normal distributions may differ in their means, variability, and in their shape, but they all have properties of the normal distribution.
The z distribution is called the Standard Normal Distribution. This distribution is always the same. It has a mean equal to zero and a standard deviation equal to one (1). See Picture 1.
Picture 1
The Standard Normal Distribution (as well other normal distributions we will be using in the course) is used to measure the distance of scores from the central point on a sample distribution. Say a sample was drawn from the population and measured on personal spiritual development (using a valid and reliable instrument), and the mean of this measure on this sample was 25.89 with a standard deviation equal to 5.32. Scores for individuals within this sample could be converted to standard z scores to determine how far from the mean each score fell. If an individual had a spirituality raw score equal to 20, then this score can be converted to a z score by applying the following formula:
![]()
On the Standard Normal distribution, this individual's score fell 1.11 deviations below the central point of the distribution (which for the sample is 25.89).
Note that the z scores are deviation scores such that a z score equal to -1.11 indicates a sigma = -1.11.
Also recall that the researcher can estimate the proportion of individuals who fell above this point in the sample and how many fell below this point. There is a table that can be used for this analysis. For this particular case, the proportion of individuals who fell between this this z point (z = -1.11) and the mean is .366; the proportion who fell below this z point is .133; and the proportion who fell above this point is .866. These proportions can also be read as probabilities, so that there is about a 37% chance or probability that a score in the sample would fall between this point and 25.89 (the mean of the sample); there is a 13% chance that a score would fall below this individual's score of 20; and there is an 87% chance that a score would fall above this score of 20.
In summary, the z score that is calculated takes into account the sample mean score as well as the standard deviation for the sample set of scores. This z value is read as a deviation from the center point of this sample's distribution of scores and there is a probability that can be assigned to this point on the Standard Normal Distribution. The center, or mean, of the Standard Normal Distribution is zero(0) because the function of the distribution is to inform the researcher how a particular score deviates from the center of the sample set of data. If a calculated z value falls exactly on the center point of the distribution (i.e., the individual's raw score equaled the mean score for the sample) then the deviation from this central point would be zero (0). The higher the z value calculated (either positive or negative), the further away this score is from the center point of the distribution of scores included in the data set.
(click here to return to topic listings)
Making Inferences About Population Parameters From Sample Statistics
Similar to how a researcher can estimate the proportion of scores that fall above and below a particular score in a single data set (or estimate the probability of obtaining a given score) using the Standard Normal Distribution, a researcher can estimate the probability of obtaining a similar mean score from another set of data that came from another sample in the same population. Or, the researcher can estimate characteristics of the population mean (mu) from a data set that came from a single sample from the population.
Let's say that we draw a random sample (n=85) from the population (N=350) and measure this sample's ability to solve problems. We ask each individual in the sample to complete an instrument designed to measure problem solving. This instrument has reported validity coefficients ranging from r = .72 to r = .87 and a reported reliability equal to r = .86. The sample produces a mean problem solving score equal to 84.58 with a variance equal to 144 and a standard deviation equal to 12.
Given this sample mean of 84.58, can we claim that the population mean on problem solving is exactly equal to 84.58?
The answer to this question is NO. The mean of the population is not likely to be exactly 84.58. However, if the sample is randomly drawn from the population so that it is representative of the population and the distribution of scores obtained from this sample is normally distributed, then the sample mean score of 84.58 can provide an excellent estimateof the population mean.
In using inferential statistics, the researcher will only be able to make estimates of the population mean based on the sample mean obtained because errors in the sampling distribution of scores on problem solving will, most likely, make this sample mean score somewhat different from the population mean score, or somewhat different from a mean score obtained for another sample drawn from the same population.
How the mean of 84.58 is likely to be different from (or deviate from) the population mean or the mean of another sample has to be taken into consideration when using inferential statistics. Consequently prior to estimating the population mean from our single sample mean, we first have to estimate how our sample mean score is likely to be different from the population mean or other mean scores obtained from other samples in the same population. We have to know something about how the mean scores of several samples might be distributed.
One way we could find this out would be to actually draw several samples (say 50 samples), at random, from the population and measure each sample on problem solving skills. If we did this, we could calculate some descriptive statistics that would tell us about this distribution of mean scores obtained on these 50 samples. We could calculate a mean of all the mean scores (called the mean of means). We also could calculate a variance and corresponding standard deviation for these 50 mean scores (this is called the variance or standard deviation of the mean scores). The mean of means and the standard deviation of mean scores is denoted as follows:
The sampling distribution for these 50 samples may look like this.
Picture 2
With this sampling distribution we would then know how these 50 sample means scores varied from each other, or how they varied from the mean of means . We would have a calculated a standard deviation of the mean scores, which would tell us how, on average, these mean scores varied from each other.
The Central Limit Theorem tells us that if each sample is randomly selected and the size of each sample (n) was > 30, then each sample would yield a normal distribution of scores and would yield a mean score that is similar to the mean score of the population.
The Sampling Distribution Theory tells us that the sampling distribution of a statistic (such as the mean) is the distribution of values we would expect to obtain for that statistic if we drew an infinite number of samples from the population in question and calculated the statistic (mean) on each sample.
As you will see below, the Sampling Distribution Theory allows researchers
to draw one sample, instead of 50 samples, and make an estimate from this
single sample as to what the sampling distribution of mean scores might
be across multiple samples. This estimate is called the Standard Error
(SE) of estimating mu from a single sample mean score.
(click here to return to topic listings)
The Standard Error: An Estimate of the Sampling Distribution
We ask the question: How might one sample mean differ from the mean of another sample? In order to estimate the population mean (mu), from a single mean, we need to consider the error we might make in this estimation. This error is how our single sample mean is likely to be different from other mean scores obtained from other samples drawn from the same population, or what the standard deviation for several sample mean scores would be.
Because we are not able to collect data on several samples for a given
study and actually calculate the distribution of several sample mean scores,
we have to make an estimate of what this distribution.might be. The formula
we would use for this estimation of the sampling distribution is:
Where S is the standard deviation of the sample and n is the number of individuals included in the sample. Recall that s, or the standard deviation, of a sample indicates an error message in that it tells the researcher how much variation there exists among the raw scores included in a data set. The standard deviation for a single sample is the averagedeviation of individual raw scores from the mean score of the sample:
![]()
The SE (the estimate of the sampling distribution), in essence, is what is expected to be the average standard deviation among n samples drawn from the same population.
Note that the larger the standard deviation for a single sample, the more spread out the scores are and the more error there is considered to be in the data set. Therefore, if the researcher has a large standard deviation, the SE will be large as well. The denominator is the sample size and indicates the number of individual raw scores in the data set. Using this sample size in the denominator is basing the SE statistic on what would likely be the sampling distribution if the same number of samples were to be drawn from a population as the number of individual raw scores making up the single data set. The larger the sample size (n), the smaller will be the SE. This makes sense because a mean obtained for a large sample, if it has been randomly drawn, is likely to be more similar to the mu than a very small sample.
Obviously, the SE formula will yield a smaller error term than the error term (or standard deviation) of a single sample given that the S of the sample is included in the numerator of the SE formula and this value is then divided by the square root of n. Actually, it makes sense that the SE of all possible mean scores obtained for several samples drawn from the population would be quite small because, based on the Central Limit Theorem, if there is a normal distribution of scores for a sample, the mean of that sample will equal (or nearly equal) mu of the population. Therefore, several samples randomly drawn from the population that have normal distributions of scores, will have very similar mean scores. There would not be a large deviation among several sample mean scores.
(click here to return to topic listings)
Estimating the Population Mu from the Sample Mean
To estimate mu from a sample mean, the following assumptions must be met:
1. The distribution of the single sample is normal.
2. The variable being observed is measured on either an interval or ratio level of measurement. (However, as was discussed in the Basic Statistics Booklet, often times Likert scaled items although considered to produce ordinal data, are treated as though they are interval data and are used to make estimates of population parameters).
3. The sampling process is replicable such that if one selected another sample from the same population one can apply the same estimation procedure to estimate m for the first sample mean. This provides consistency in making estimations of mu across several possible sample mean scores.
The estimation of mu from a sample mean is done using confidence intervals -- or by establishing a range of possible values within which we have a certain level of confidence that mu is likely to fall, given our sample data.
This confidence interval is associated with two probabilities:
1. A probability of being incorrect -- with a given level of error or chance that m does not fall within the confidence interval. This error or chance of being wrong is expressed as a percent or probability value, and the researcher establishes what level of error he or she is willing to accept in being wrong about making estimations about the population parameter mu from the sample mean. When we get to hypothesis testing, this error rate is denoted as the:
Usually the researcher does not exceed an alpha level of .10 (10% chance of being wrong in making an inference about about mu from the sample mean). In fact, for most research the researcher is not willing to exceed a 5% change of being wrong in making inferences about mu from the sample mean.
2. A probability of being correct --
with a given level of confidence that the population mean (mu) falls within
the range of the confidence interval. This level of confidence, when we
get to hypothesis testing, is denoted as 1-alpha; or:
As you may have guessed, the probability of being correct and incorrect
are interdependent. If the probability of being incorrect (alpha) is .10
or 10%, then the probability of being correct (the confidence or 1-alpha)
that mu falls within a specified range of values is .90 or 90%.
For the purpose of estimating population parameters (i.e., mu and sigma) from sample statistics (i.e., the sample mean and standard deviation), you will be using normal distributions. The example below is based on the Standard Normal Distribution (the z-distribution). In subsequent modules, you will be using the t-distribution and the F-distribution.
As was discussed earlier in this module, the z-distribution is normal with mu=0 and sigma=1. Raw scores as well as sample mean scores can be converted to a z score. Based on this z score, the researcher can identify a specific location on the z-distribution and then calculate a probability of obtaining a score higher or lower than this z score. The researcher can also calculate ranges within which a score might fall. This is what we will be doing in setting up confidence intervals.
For example: Given z values of a +1 and a -1, the proportion of scores that would fall between these two positions would be .68 or 68% (refer to z-table in Basic Statistics Booklet or in the Hinkle book). This could also be interpreted as being a probability = .68; or there is a 68% chance that the mu score would fall between these two points. In this example, there would be a 68% level of confidence and a 32% level of error. Note that a 32% level of error is very large for research studies. See Picture 3.
Picture 3
Notice that in Picture 3, half of the alpha level appears in each end of the distribution. This places 16% of the error in the left tail of the distribution and 16% in the right tail of the distribution.
Given z values of a +2 and a -2, the proportion of scores that would fall between these two positions would be ..954 or about a 95%. This could also be interpreted as being a probability equal to about .95; or there is about a 95% chance that the m score would fall between these two points. Similarly, there would be about a 5% chance that mu would fall outside these two points. In this example, there would be about a 95% level of confidence (1-alpha) and about a 5% level of error (alpha). See Picture 4.
Picture 4

(click here to return to topic listings)
An Example of How Population Parameter Mu can be Estimated from a Sample Mean.
Returning to the topic of spiritual development, suppose a researcher is interested in knowing the spiritual development of Chief Executive Officers of corporations in the Northwest. A sample of 200 CEOs is randomly selected from a list of 420. Each CEO is mailed the questionnaire and 120 choose to complete and return it (a spectacular 60% response rate!). The researcher calculates a mean score on the spiritual development data that is equal to 23.62 (out of a possible total score of 50, with the higher the score, the greater the spiritual development). The standard deviation equaled to 6.49
To set up a confidence interval estimate mu from sample mean, the SE of the sample mean score (or the standard error of estimating mu from the sample mean) must be taken into consideration. And the z positions of the Standard Normal Distribution must be transformed into values that have meaning to this study (i.e., the z scores must be transformed into spiritual development scores).
The formula for setting up a confidence interval is:
![]()
Where the absolute value of z is used (this is because this formula
already has taken into account the negative and positive values for z.
Notice that the equation to the left includes exactly the same symbols
except for the positive and negative signs. The SE for the formula is calculated
by:
![]()
To set up a confidence interval that would provide a 68% level of confidence
(and a 32% level of error), the z values would be equal to a +1 and a -1
(refer to z table). The confidence interval, which would transform the
z positions on the Standard Normal Distribution into spiritual development
scores, would be:
Picture 5
In this example the researcher would have a 68% level of confidence that the mean for the population fell between 23.03 and 24.21. However, he also would have a 32% chance that the mean for the population did not fall between these two levels.
The researcher may want to establish a confidence level that does not
contain such a high rate of error. Using z positions on the Standard Normal
Distribution that are equal to a -2 and a +2 would decrease this error
rate. However, this widens the range of values within which the researcher
is projecting the mu falls:

Picture 6
This provides the researcher a higher level of confidence about mu based on the sample mean score.
Typically, the researcher specifies a desired error level for setting up confidence intervals and for testing hypotheses. This error level (alpha). This error rate is the maximum error rate the researcher is willing to tolerate in making inferences about the population parameters, such a mu, from sample statistics, such as the mean. If the researcher chose not to exceed an error rate of 5%, then the confidence interval for the data set in this example would be close to what we observed when z = . However, this needs to be more exact. Looking at the z distribution table, a more exact z value for an error rate equal to 5% would be a + and - 1.96 (rather than a + and - 2.00). The confidence interval now becomes:
Note that another way you might see the formula for a confidence interval
presented in text books is:
![]()
This symbolizes a 95% level of confidence.
(click here to return to topic listings)
1. To use parametric inferential statistics, the sample data must:
a. Be very large (60% of the population).
b. Have a very small standard deviation.
c. Be normally distributed.
d. All of the above.
2. A researcher randomly selects 45 doctoral students and asks them to complete a questionnaire regarding their attitudes towards statistics. The sample data collected on this variable indicates that the distribution of scores is normal. This is not surprising given the:
a. Sampling Distribution Theory.
b. Standard Error of Measure.
c. Standard Deviation.
d. Central Limit Theorem.
3. Which of the following are characteristics of the normal distribution?
a. The mean and median are similar in value, but not the mode.
b. There are about the same number of scores that fall in each end of the
distribution.
c. The standard deviation is similar in value to the mean.
d. All of the above are true.
4. The Standard Error indicates:
a. How the single sample mean is likely to vary from other samples that might
be drawn from the population
b. How the single sample mean is likely to vary from the population mean.
c. The standard deviation of several sample mean scores.
d. All of the above.
e. None of the above.
5. Estimating mu from a single sample mean is done by:
a. Estimating how the single sample mean is likely to be different from other
mean scores that would come from other samples in the same population.
b. Estimating an exact value for mu and then assigning a probability value to it.
c. Using probability statistics to determine how close the sample mean is likely
to be to the same mean score in the population.
d. Estimating the chance of being correct in using inferential statistics.
6. The Standard Error is most directly impacted by the:
a. Sample size.
b. Population size.
c. Population mean.
d. Sample mean score.
7. A CI 90 indicates a confidence interval that has a:
a. 90% chance that m falls within the interval.l
b. 10% chance that m falls within the interval.
c. 90% chance that another sample * score would fall outside the interval.
d.10% chance that the sample * is in error.
8. The chance that the researcher might be wrong in making decisions about m from * is based on the:
a. Confidence interval.
b. Alpha level.
c. Z score.
d. All of the above.
9. Alpha (a) and the corresponding 1-a is established by:
a. Statistics textbooks.
b. The size of the population.
c. The size of the sample.
d. The researcher's discretion.
10. Which of the following statistics is likely to be the smallest in value?
a. The variance of the population.
' b. The variance of the sample.
c. The standard error.
d. The mean score of a sample.
11. For a confidence interval, if z= *1.00, then this can be interpreted as having a:
a. 68% chance of m not falling in the interval.
b. 16% chance of m not falling in the interval.
c. 90% chance of m not falling in the interval.
d. 32% chance of m not falling in the interval.
12. If 1-a is 99%, then the chance of being correct is ___ and the percent of being incorrect
is ___:
a. 99%, 1%
b. 1%; 99%
c. Both a and be are correct
d. Neither a nor b are correct