The student will:
1. understand basic principles of quantitative research design
2. understand the roles that statistics play in planning and carrying out quantitative research
3. understand basic descriptive statistics and how they are used in quantitative research
4. differentiate between statistics and parameters and their use in quantitative research
(Return to List of Module Contents)
Reading Assignment for Module 1
Required: Review Basic Statistics Booklet from DPLS 720; SPSS text Chapters 1-4
Recommended (optional): Hinkle et al. text: Chapters 1-3
(Return to List of Module Contents)
Notes
for Module 1
Topics
Included in Notes:
Quantitative Research Design
An Overview Critical
Issues Concerning Quantitative Design
Four Types of Quantitative Research Questions
Variables Inherent in Research Questions
The Role of Statistics in Quantitative Research
The Level of Measurement of Variables
Descriptive Statistics: Measures of Central
Tendency
Descriptive Statistics: Measures of Variability
(Return to List of Module Contents)
Quantitative Research Design: An Overview
There are two general types of quantitative research design: survey and experimental.
As the name implies, survey researchinvolves the researcher surveying individuals in a population, or a sample drawn from a population, regarding their opinions, behaviors, and/or demographic characteristics. The survey usually is large scale meaning that a large number of individuals are asked to provide information that will be helpful to the researcher in meeting his or her research goals. Usually in survey research the researcher observes several variables and looks at how they relate to each other. For example, the researcher may observe (or measure) variables such as 1) attitudes towards gun legislation, 2) education level, 3) age, 4) place of residence, and 5) gender. Given data collected on these variables, the researcher may then do a series of statistical analyses to determine how these variables are associated with each other. A variety of statistical methods might be employed to accomplish this -- such as Pearson correlation, t-test for independent samples, or Analysis of Variance. All of these statistical tools will be presented in subsequent EDLD 722 course modules.
For experimental research, the researcher also is interested in looking at the association of one variable to another; however, there typically are fewer variables the researcher observes and there is one variable that takes center stage -- the treatment or experimental variable. The researcher is interested in finding out if this treatment is able to make an impact on an outcome variable.
For example, the treatment variable may be a new method of treating mental patients and the outcome variable may be the patients' mental health status. If there is a noted change in patients' mental health status following administration of this new treatment method, the researcher hopes to be able to draw conclusions that it was the new treatment (and not another intervening variable) that caused this change to occur.
To carry out this research, the researcher would need to design (or select) and implement the treatment and then collect data on the outcome variable(s). Data on the outcome variable(s) typically are collected before the treatment is implemented and again after the treatment has been completed. This design is depicted in Picture 1:
Picture 1
Experimental research requires careful design to allow the researcher to make sound generalizations about the effect of the treatment. The study needs to be carefully controlled. In social science research, nearly any experimental research design has weaknesses, and it is important that the researcher understands what these weaknesses are and makes every attempt to choose the best experimental design possible. For example, a stronger design for experimental research than what was presented in Picture 1 would include a control group to which the researcher can compare the treatment group's outcome scores:
Picture 2
This design is stronger because it allows the researcher to see if changes in the outcome variable may have occurred by normal maturation. If the control group changes on the outcome variable in a similar way as does the treatment group, then the treatment is not likely to be what caused the treatment group's scores to change. Statistical tools commonly used to test for changes in the outcome scores or to test if there are differences in outcome scores between the experimental and control groups include: the t-test for dependent samples, the t-test for independent samples, or Analysis of Variance. More on these tools in subsequent modules.
Critical Issues Concerning Quantitative Research Design
There are six critical issues about quantitative research discussed below. These issues the researcher must carefully consider when designing and carrying out the study.
Critical issue # 1: How the variable(s) for the study will be measured. The researcher needs to consider how the instrument will be scored and the level of measure this scoring process will yield. It is wise for the researcher to try to develop or select an instrument and score it in such a way that it will yield at least an interval level of measure. In addition to the level of measure, the researcher must have some knowledge about the quality of the instrument. Is it valid (does it measure what it is intended to measure)? and can it produce reliable scores (scores that are consistent over repeated administration of the instrument)? If the instrument is neither valid nor reliable for a measure of a variable then the data collected via this measure are likely to be highly consumed with errors. These errors create problems in running various statistics tools and certainly cause problems for the researcher being able to draw any valid conclusions about data findings (it is the old garbage-in, garbage-out principle).
Critical issue # 2: How the sample will be drawn from the population. A primary function of statistical data analysis is to give the researcher an estimate of what the population is like, given data collected on a sample that is drawn from the population. Consequently, how the sample is selected is important. Making inferences from a sample to the population can most accurately be done when the sample is randomly selected and when the sample is of sufficient size.
Theoretically, a random selection of research participants from the population provides a sample that reflects the population on any variable. For example, if the population consists of 45% males and the population has an average opinion score on gun control equal to 28.9, then, likewise, the sample would consist of approximately 45% males and would have an average opinion score on gun control that is similar to that of the population.
For survey research, typically the sample is selected in a random fashion. For experimental research, although the ideal is random selection of participants and random assignment of them into groups (i.e., treatment and control groups), this is not likely to happen, and what is called a "convenience" sample is used. A convenience sample is a sample that is available to the researcher and one that agrees to participate in the study. This reality makes the design quasi-experimental research.
The size of the sample also is of great importance for the researcher to be able to make inferences from the sample back to the population. In general, the larger the sample, the smaller the inferencing errors. Survey research requires a rather large number of participants in order to analyze the many variables included in the study in a meaningful way. For some statistical analysis techniques that involve many variables at once, the researcher should have at a minimum five times more participants than variables.
For experimental research, the number of research participants tends to be smaller than for survey research. For one thing, the number of variables tends to be much smaller than for survey research. In addition, the researcher has to implement a treatment and needs to control this implementation to the extent possible and this can best be done with say only 50 participants rather than 500. Furthermore, in running various statistical tests, the researcher needs to be concerned about the "power" of the test and if the sample size is too large, the researcher may have too much "power" for the test (more on this is a later module).
Critical Issue # 3: How the Treatment will be Administered. If the researcher chooses to use an experimental design for his or her study, then the treatment is the primary focus of the study: Can the treatment produce outcomes that are different from what is expected if the treatment did not occur? The treatment needs to be clearly defined as to its goals, objectives, and procedures. It is the responsibility of the researcher to describe the treatment and its implementation so that it is clear as to what exactly the results are referring to, and so that the treatment can be replicated by other researchers in the future.
Critical Issue # 4: How Data will be Collected.When collecting the data for the study (the data are measures of the variables of interest to the researcher), the researcher must take care that participants are free to respond in a true and sincere manner. If the variable to be measured is in the form of a knowledge test and the researcher stands in the front of the participants and gives them clues about the correct responses, then these data will be seriously flawed. Or, if the researcher mails a questionnaire to a sample that can read only at the 6th grade level but the questionnaire includes 12th grade level language, then the data collected are not likely to be reliable because there would be confusion as to the meaning of questions included on the instrument.
Critical Issue # 5: How Many Individuals Volunteer to Participate. This is an important issue especially for survey research. If 1000 questionnaires are mailed out and only 200 are returned, there is a very serious problem as only 20% of those asked to participate in the study choose to do so. The big question is: Do those who choose to respond have opinions that are different from those who choose not to respond? Babbie suggests that for survey research, there needs to be about a 65% response rate in order for the researcher to be able to generalize from the sample back to the population about variables observed. In terms of statistical data analysis, the smaller the response rate, the smaller the number of participants, and the larger the errors in making inferences from the sample to the population. For experimental research, if there is a low response rate compared to the number of participants in the study, then there are problems with "mortality" of subjects and this seriously jeopardizes the researcher's ability to draw valid conclusions about the importance of the treatment administered.
Critical Issue # 6: How Data will be Analyzed. How the researcher chooses to analyze the data is critical to the validity of the study. If the researcher chooses a statistical tool or method that is inappropriate for the kind of research question being asked, or for the nature of the datathat have been collected then the data analysis could produce results that are misleading. .
Four Types of Quantitative Research Questions
Recall from the EDLD720 course (Principles of Research) that there are four types of questions that can be answered by quantitative research design: descriptive, group difference, correlation, and prediction. Each type of research question requires a different type of data analysis technique. In fact, the nature or type of the research question will serve as a guide for selecting the appropriate statistical tool to answer the question. Consequently, it is important to know what kind of question is being asked.
As was mentioned before, in survey research, typically, there are several variables the researcher is interested in investigating, and, typically, there are several research questions to be answered by the study. These questions can focus on describing variables, group differences, correlation between variables, and/or prediction models. It is not uncommon for survey research to answer more than one type of research question.
For experimental research, the number of variables being observed by the researcher tends to be less than for survey research. The type of question most commonly used for experimental research is group difference. However, often there are descriptive statistics reported prior to answering the group difference question. It also is possible for the researcher to determine group differences by testing whether group membership can predict the outcome variable.
Picture 3 contains examples of the different research questions that
could be asked for survey research and experimental studies.
Picture 3
Variables Inherent in Research Questions
Within each research question there is at least one variable named. Obviously, it is important that the researcher knows which variable(s) are to be observed or measured for the study, and each variable to be observed needs to be clearly defined. For example, if the researcher is planning to measure spiritual development, then she needs to define spiritual development and make sure that the instrument selected or developed measures spiritual development as she has defined it. This assures validity of the instrument from the perspective of variable definition.
For research questions that ask about group differences, relationships, or prediction, there is more than one variable inherent in the research question. For example, the group difference question for survey research presented in Picture 3 asks if there is a significant difference in opinions about hazardous waste between those who are Republicans and those who are Democrat. In every group difference type of question there is a dependent variable and an independent variable. The independent variable is always the grouping variable, or the variable that places participants into groups. In this case it is political party. There are two groups (or two levels) of this variable: Republicans and Democrats. If there was only one group or level, such as only Republicans were surveyed for the study, then political party type is NOT a variable as there would be no variability -- all would be Republicans, and there would be no scores available regarding opinions about hazardous waste for Democrats. The dependent variable is the variable likely to vary or be different because of the grouping variable (and thus it is dependent on the grouping variable). In this example, the dependent variable is opinions about hazardous waste.
Picture 4
For the relationship type of research question, it oftentimes is not
possible or necessary to distinguish between a dependent or independent
variable. To identify one variable as dependent and the other as independent,
there needs to be a logical reason for doing so, and the independent variable
must occur in time before the dependent variable.
For the prediction type of research question, there needs to be a dependent variable identified in order to organize the variables for data analysis. The variable to be predicted is the dependent variable and all other variables to be included in the analysis are independent or predictor variables. This makes sense because scores for the variable being predicted (e.g., opinions about hazardous waste) in the model will depend on scores obtained on those (independent) variables doing the predicting.
Picture 6
The Role of Statistics in Quantitative Research
Statistics play a number of major, interrelated roles in quantitative research. Statistics are not to be considered only at the time of data analysis. Rather, the principles of statistics need to be understood and taken into consideration during the planning phase of the research as well. For example, these principles can guide the researcher in obtaining a level of measure that is desired, using an instrument that is valid and reliable, selecting a sample that is representative of the population and that is of sufficient size, and obtaining answers to research questions or testing hypotheses that can be generalized to the population from the sample.
There are two types of statistics used in social science research. One type of statistic is descriptive. Descriptive statistics refer to methods used to organize, summarize, and tabulate data such that they describe simply and clearly a data set. Descriptive statistics provide a picture of what happened in the study or of what exists in a data set. Without descriptive statistics, data would be overwhelming and unable to be interpreted.
Descriptive statistics provide a basis for a second type of statistic:
Inferential statistics.
Inferential statistics refers to
methods used to draw inferences about a population based on descriptive
data available on a sample drawn from the population.
Picture 7
Data summaries pertaining to a sample set of data are referred to as
sample
statistics. Sample statistics are denoted with Latin characters,
such as the following for the mean and standard deviation for a sample
set of data.
Statistics used to make inferences from the sample set of data to the population are called inferential statistics and are denoted with Greek characters. These Greek characters refer to population parameters.
As was mentioned earlier, the role of statistics in quantitative research is to use sample statistics to make inferences about certain population parameters. The sample statistics (Latin characters) used to infer population parameters (Greek characters) are listed in Picture 8.
Picture 8
Inferential statistics can be further classified into those that pertain to parametrics and those that do not. Parametric inferential statistics such as those listed above require information about the distribution of the sample set of scores -- such as what the mean of the distribution is and the size and shape of the distribution. There are certain inferential statistics that require information about distributions and certain assumptions need to be met when using these statistics. Parametric inferential statistics covered in subsequent modules for EDLD 722 include: z-test, t-tests, analysis of variance, correlation, and regression.
Non parametric inferential statistics do not require information about the distribution of scores. These statistics are used when assumptions needed for parametric inferential statistics are not met or when the data are at an ordinal or nominal level of measurement. Non parametric inferential statistics covered in Module 7 are based, for the most part, on the chi-square distribution.
The Level of Measurement of Variables
Most certainly, the researcher needs to know the level at which variables
have been measured for the study. If the researcher does not pay attention
to this then he may select the inappropriate statistical methods to answer
research questions, resulting in unclear and inaccurate results. A summary
of the four levels or scales of measurement are presented in Picture 9.
NOTE: For parametric inferential statistics, the level of measurement needed is at least the interval level. It makes no difference if the data are interval or ratio, as long as it is at least interval. Measurement of attitudes on a Likert scale (for example where participants rank their responses) is considered by some statisticians as being ordinal data; however, this type of data is often treated as interval data and parametric statistics are used to analyze them. The researcher needs to be certain that the assumptions of a given statistical method are met, and they may not be met with such data.
Also NOTE that in some instances the researcher can change the level of measurement of variables. If the four levels are seen as hierarchical with nominal level of measure being the level at the bottom and ratio being at the top, then it can be said that the researcher can change a level of measure going from a higher level to a lower level (such as changing ratio data into interval, ordinal or nominal data). However, the reverse is not true. The researcher can not change data measured at the ordinal level of measurement into interval or ratio data. The researcher needs to keep this in mind when planning the study.
Picture 10
Descriptive Statistics: Measures of Central Tendency
Measures of central tendency indicate a central or approximate middle position of a data set or the distribution of a data set. The mode measure of central tendency does not take into consideration the distribution of the scores. It merely identifies what value of a variable occurred most frequently and is used most often to describe nominal or categorical data. For example, the mode can be used to indicate whether there were more males or females among the research participants, or what color was the most favored color among participants. The mode is not a commonly used measure of central tendency in quantitative research because it does not take into consideration all data points in the distribution.
The median measure of central tendency is helpful in describing ordinal data where the data represent ranked scores. For example, if research participants were asked to rank order a list of ten items in terms of importance then the researcher could indicate the median ranking obtained. This is done by listing the ranked scores from lowest to highest (or vice versa) and then identifying the middle or central number in the list.
Picture 11
The mean measure of central tendency is used most often in quantitative research as this measure takes into account every score on a variable included in a data set--from highest to lowest. The formula for calculating the mean is as follows.
Descriptive Statistics: Measures of Variability
How individuals' raw scores fall around the mean is important information for parametric inferential statistics. Consequently, information about the distribution of scores is needed. Information about the distribution of scores around this center point can be obtained by calculating the variability of the scores.
One measure of variability is the rangeof scores. As with the mode measure of central tendency, the range does not provide very specific information. It merely identifies the highest and lowest values in a data set.
The measure of variability used in parametric inferential statistics are the standard deviation or the variance. Recall that these two measures are, in essence, the same as the standard deviation is the square root of the variance. Also recall that to calculate the standard deviation, the variance must be calculated first and the standard deviation derived from the variance. The formula for the variance is as follows:
The numerator of the formula for calculating the variance of a data set involves summing the squared differences between individual raw scores and the mean score. This is often referred to as "Summing the Squared Differences." This term we will talk about again in a later module.
The denominator of the formula for calculating the variance of a data
set involves dividing the Sum of the Squared Differences by the number
of scores included in the data set (n) minus one. (NOTE: that one is subtracted
from n to correct biasing that may occur, especially if data sets are small).
In statistics, whenever you divide a value by the number of scores in a
data set (n), the quotient is an average score. With this in mind, you
can say that the variance of a data set is: The average squared difference
between individual raw scores and the mean of the data set. The standard
deviation, then, is: The average difference between individual raw scores
and the mean score of a data set.
The variability of the scores indicates how far, on average, scores deviated from the mean score. This information is considered to be an "error" message that is taken into account when the researcher attempts to make inferences from the sample data back to the population.
This is an error message because it speaks about consistency, or lack of consistency, there is among scores in a data set. If the variance or standard deviation is quite large in value, then that means the scores in the data set were quite varied and deviated a great deal from the mean or central point of the distribution. On the other hand, if the variance or standard deviation is small, then that means the scores in the data set were not far from the mean in value -- the scores were more consistent with the mean scores.
Picture 13
If the data set is small in number it is possible that one or two extreme scores, scores that are "way out there" compared to all the other scores in a data set, can skew the distribution of scores. This is something the researcher needs to watch out for. The distribution of scores around the mean or central point of the distribution need to be rather symmetrical in shape. If this does not happen, then the outliers or scores that are way out there need to be evaluated as to whether they were accurately recorded. If they were, then the researcher may have to transform all scores in the data set (such as by a square-root or log10 conversion) to try to shrink the skewed end of the distribution. In a later module, how to test for normality of distributions is discussed.
(Return to List of Topics)Self-Evaluation
1. A researcher hypothesizes that adults in the United States with different eye colors have different rates of lung cancer. Eye color is an example of what level of measurement?
a. Nominal
b. Ordinal
c. Interval
d. Ratio
2. Twenty-five prospective Olympic swimmers participate in a swim competition. The swimmers are rated 1 (excellent), 2 (very good), 3 (good), or 4 (fair). This is an example of what level of measurement?
a. Nominal
b. Ordinal
c. Interval
d. Ratio
3. A researcher is interested in knowing if there is a relationship between ethnicity, income, major area of study, and previous work experience among recent graduates of public universities. She randomly selects 750 Caucasians -- 428 males and 322 females -- among recent graduates of public universities in the Northwest. She sends out a survey to measure the variables. What problem is there with her research design?
a. There are more males than females.
b. Ethnicity is not a variable.
c. The Northwest is too limited.
d. Age should be a variable.
4. Inferential statistics can be used to:
a. Measure the population parameters.
b. Determine percentages.
c. Estimate characteristics of the population.
c. Determine errors in measurement.
5. Population parameters pertain to:
a. Estimates of frequencies and percents of responses that exist in a population.
b. Estimates of distribution of response scores in a population.
c. Characteristics of the sample distribution.
d. The circumference of the sampling frame.
6. A researcher designs a study to determine the impact a program on conflict resolution has on junior high students in a large inner-city school that has experienced a high rate of racial tension over the last three years. All students are asked to complete an instrument that measures perspectives on how to resolve conflicts six months prior to implementation of the program and again six months after the program is completed. In addition to this observation, the researcher plans to observe students who are involved in racial incidents in small group settings to see how they resolve conflict. These observations also will be made six months prior to and six months after the program is implemented. What kind of research is this researcher conducting?
a. Survey research
b. Experimental research
c. Quasi-experimental
7. For the design described in question 6, what are the variables to be observed and which are dependent and independent variables?
a. The independent variable is the program on conflict resolution. The dependent variables include perceptions on conflict resolution and ability to resolve conflicts.
b. The independent variables are students' perceptions about the treatment. The dependent variable is the treatment itself.
c. There are neither dependent nor independent variables in this study.
The variables are: the program, students' perceptions, and students' behaviors.
8. For the design described in question 6, what kind of question would be the researcher's primary or most important question?
a. Group difference question (experimental and control group).
b. Group difference question (pre and post measures).
c. Relationship question (students' perceptions, students' behaviors,
and the program).
d. Descriptive question (students' perceptions and students' behaviors).
9. For the design described in question 6, to whom can the researcher generalize findings?
a. Junior high school students in that building.
b. Junior high school students in similar settings.
c. Junior high school students in the local area.
d Both a and b above
e. All of the above
10. Which measure of central tendency is most often used in parametric inferential statistics?
a. Mode
b. Median
c. Mean
11. The standard deviation indicates:
a.The number of raw scores that were close to the central point of a
distribution.
b. The skewness of the distribution.
c. How scores, on average, deviated from the central point of a
distribution.
d. The square root of the range.
12. The standard deviation is:
a. An error message.
b. A descriptive measure.
c. Affected by sample size.
d. All of the above
e. Only b and c above.
13. Sample statistics refer to:
a. The sample only.
b. The sample and the population.
c. The population only.
14. Greek characters are used to represent:
a. Population parameters.
b. Sample characteristics.
c. Both sample and population characteristics.
15. Latin characters are used to represent:
a. Population parameters.
b. Sample characteristics.
c. Both sample and population characteristics.
16. Reliability of the instrument refers to:
a. Whether the instrument is measuring what it is supposed to measure.
b. The consistency with which the instrument measures a variable.
c. How persuasive the sample can be.
d. The consistency of the sample selection.
17. Validity of the instrument refers to:
a. Whether the instrument is measuring what it is supposed to measure.
b. The consistency with which the instrument measures a variable.
c. How persuasive the sample can be.
d. The consistency of sample selection.
18. Statistical analysis of data should be taken into consideration:
a. After data have been collected and reviewed for quality.
b. At the time the research questions are to be answered.
c. As the study is being designed.
d. All of the above.
19. A researcher selects an instrument that was developed for another study population to use for his/her study. The instrument measured "hope," and included 25 items and had been used only for this previous study. This previous study involved young adults who had completed a drug-alcohol treatment program. The instrument had been administered in small group settings. For the current study, the researcher intends to mail the instrument to a random selection of 200 workers in a large corporation that is undergoing restructuring. The workers will not be required to put their name on the instrument and their responses are to be mailed back to the researcher. A possible problem that may exist for this current study is:
a. The instrument may not be valid or reliable for a different population.
b. The sampel size for the corporation is probably too small.
c. There is likely to be sampling bias.
d. How data are to be collected is questionable.
(Return to List of Topics)