Our benefits

24/7 customer support

Professional writers

No plagiarism

Privacy guarantee

Affordable prices

94% of return customers

Free extras

Free title page

Free bibliography

Free formatting

Free of plagiarism

Free delivery

Home
STATISTICAL SUMMARIES

Verbal summaries not only describe qualities of phenomena but they sometimes also refer to quantities, though often in a rather imprecise manner--more, less, many, few, larger, smaller, and the like. But such inexact signifiers of quantity are not nearly as helpful as specific numerical amounts. Thus, in studies that concern numbers of phenomena, numerical summaries in the form of statistics are much preferred. The following discussion depicts the use of statistics from two perspectives: (a) how to select the statistic most appropriate for answering a particular research question and (b) how to get help with calculating and interpreting the chosen statistic.

Matching Statistics to Research Questions

When selecting a type of statistic that will be most suitable for answering the question at hand, it is useful to recognize that statistics in research reports typically perform one or both of two functions--the descriptive and the inferential. The descriptive involves summarizing information in an easily comprehended, quantitative form. The inferential involves providing an estimate of how likely a sample of people or events accurately represents a broader population of people or events. The following discussion first identifies a variety of descriptive statistics, then turns to matters of inference. The aim of the discussion is limited to suggesting which sorts of statistics are most suitable for answering which kinds of research questions. The aim does not include explaining the mathematical foundations underlying statistical procedures or demonstrating methods of computation. Nor does the aim include specifying (a) the assumptions on which each procedure is based or (b) the detailed advantages and disadvantages of the various procedures. Such mathematical foundations, computational techniques, underlying assumptions, and precise advantages/disadvantages of different forms of statistical analysis will be found in the kinds of books listed at the end of this section.

Descriptive Statistics

As mentioned above, description involves summarizing information in an easily comprehended, quantitative form. The kinds of descriptive statistics included in the following discussion are percentages, percentiles, measures of central tendency, measures of variability, and correlation techniques. The presentation of each type identifies the sort of research questions the statistic is designed to answer.Throughout this presentation, the word distribution refers to a collection of quantities (groups of people or objects, costs of goods, amounts of income, and more) that are arranged in a sequence from the highest to the lowest or from the most to the least. For example, employees' efficiency ratings would be listed from the employee judged the most efficient to the employee judged the least efficient. Student's grade point averages would be arrayed from the highest to the lowest.

Percentages

The research question: What proportion of a variable (such as candidates for office, nations' population growth rates, a student's test results, a company's budget, or the like) displays a particular characteristic?

 In the election for state attorney general, 43% of the women voted for Johnson, 40% for Trang, and 17% for Coronado. Among the men, 31% voted for Johnson, 27% for Trang, and 42% for Coronado.
 The annual urban population growth rate in Argentina is 1.65%, in Afghanistan 4.84%, and in Tanzania 9.59%.
 Natalie's test scores were: 86% in language usage, 68% in mathematics, 72% in science, and 93% in social studies.
 

The company's budget allocated 67% of the funds for personnel salaries, 13% for equipment and supplies, 8% for travel, 7% for administrative expenses, and 5% for miscellaneous costs.

 

A valued feature of percentages is their ability to translate disparate measures into a common coin that permits easy comparisons among the measures. Another advantage of percentages is that they are a familiar part of the general public's everyday living, so research results expressed in percentages can be readily understood by a very broad audience.

 

Percentiles

 

The research questions: What proportion of a variable falls below a designated point on a 100-unit scale? Within a collection of items (people, institutions, objects, events, or the like) where does one item rank in comparison with the others?The meanings of percentage and of percentile are closely linked but not identical, which is a distinction sometimes missed by people not well acquainted with the two terms. Whereas a percentage tells the proportion (on a 100-unit scale) of a variable that displays a given characteristic, a percentile is the point on the scale below which a given percentage of people, objects, or events are located. Thus, a percentile tells where a particular person, object, or event stands in relation to the total number of other persons, objects, or events in terms of some specified feature. Let's say that Natalie was in a class of 33 students. Of the 50 items on the language-usage test, Natalie answered 43 (86%) correctly. However, 16 of Natalie's classmates did better than she by answering 44 or more items correctly. This meant that in the class of 33 students, 16 of them (48.5%) scored higher than Natalie and 16 (48.5%) scored lower. Hence, Natalie was in the middle, at the 50th percentile--the point below which nearly half of the students' scores fell. In like manner, we could determine Natalie's percentiles in the other three subject areas by computing where she ranked in relation to her classmates on the math, science, and social-studies tests. Furthermore, by learning the annual urban-population growth rates of an additional 17 countries, we could determine the percentile ranks of Argentina, Afghanistan, and Tanzania in comparison to the others. We could then conclude that:

 
 Natalie's percentile ranks in the four tested subjects were: 50th percentile in language usage, 15th percentile in math, 67th percentile in science, and 88th percentile in social studies.
 In urban-population growth rate for 20 developing nations, Argentina is at the 10th percentile, Afghanistan at the 65th percentile, and Tanzania at the 95th percentile.

Percentiles thus provide a convenient way to show one unit's (person's, nation's, college's, company's, or such) position on some measure in relation to the other units in the group under consideration. Like percentages, percentiles allow a researcher to compare disparate rankings in terms of a single, readily comprehended scale.

Measures of central tendency

The research question: What single number can show the level reached by a group in terms of some measure? In other words, how did one group fare, in general, compared to another group? The measure in question can be any one of many kinds--annual family income, amount of time watching television, level of formal education, numbers of days employees were on sick leave, incidence of teenagers using illicit drugs, incidence of manic-depressive psychosis, scores on an intelligence test, and much more.The three most commonly used central-tendency statistics are the arithmetic mean, the median, and the mode. Each is designed to answer a particular centraltendency question.Mean. The research question: What was the average among the measures of some characteristic?

 Ratings on leadership among tall men were 17.3 points higher on average than ratings of short men.
 The cost of a personal computer this year declined by $217 from last year's cost.
 

The average school class size in Monarch City is 25.7 students, in Desert Wells is 30.8, and in Langston 33.7.

 
The arithmetic mean is computed by adding together all of the measures attained by the members of a group, then dividing the sum by the number of members. Groups can then be conveniently compared in regard to how well they performed in general or in the main or on the average.Median (50th percentile). The research questions: What score separated the upper half of the group from the lower half? Which score fell in the exact middle of the group's distribution of scores?Whereas the mean for a group of students that took a test is computed by totaling their scores and dividing by the number of students, the median is determined by listing the students' test scores from the highest to the lowest, then counting up this list to find the halfway point. The median is that halfway score (if there is an uneven number of students) or is the space between the two scores that lie just above and below the middle (if there is an even number of students). The median obviously is the same as the 50th percentile.
 The median household income in Preston is $41,000 and in Marline $57,000.
 

The median times spent on homework assignments by sophomores over the past month were: mathematics 25 hours, science 16 hours, foreign language 22 hours, and history 19 hours.

Mode. The research question: Which is the most popular amount in a collection of amounts?

In an array of test results, the mode is the one score that the greatest number of students earned. In a survey of the family incomes in a town, the mode is the particular amount of income received by the greatest number of people. As a measure of central tendency, the mode is usually less helpful than either the mean or the median, because the mode might occur at any place in the distribution other than near the middle.

Measures of variability

For many research interests, it is not sufficient to learn only the average of an array of measurements. It's also important to learn how much the measurements are bunched together or spread out. For this purpose, we need statistics that summarize the extent of variability or dispersion in a distribution. Several kinds of variability measures are available. Those described in the following pages include the total range, distance between percentiles, interquartile range, standard deviation, and variance.

Range. The research question: What is the distance between the highest score and the lowest score?

At first glance, it might appear that the range is a desirable measure of dispersion, since it's easy to compute and understand. However, the range is determined entirely by the two scores at the opposite ends of a distribution. Consequently, it fails to show whether the bulk of the scores between those extremes are bunched together or spread out. When people ask for a report about the variability within a group's performance, they typically want to know about the group in general, not simply about the two extreme individuals at the opposite ends of the array. Thus, in research projects, the range is rarely useful. Unlike the range, the following statistics depict the variability of the bulk of the items in a distribution, not just the two at the extreme ends.

Distance between percentiles. The research question: How many units of measurement or of scores lie between a selected percentile in the upper half of a distribution and another selected percentile in the lower half?

As explained earlier, along a 100-point scale a percentile is the point below which a specified fraction of the measurements or scores are located. A girl who is taller than 78% of her agemates is at the 78th percentile in height. A boy who runs faster than 43% of other 9-year-olds is at the 43rd percentile in speed.

The distance between selected percentiles can be used to describe the extent of variability among measurements in a distribution. To choose which percentiles to use, we need to estimate what portion of extreme scores at the opposite ends of the scale we wish to disregard in order to report how much the majority of the scores were spread out or clustered together. If we decide that eliminating 10% at each end would be sufficient to prevent extreme scores from affecting the impression of group variability, then our measure of dispersion will consist of reporting the distance between the 10th percentile and the 90th percentile, thereby encompassing the middle 80% of the scores in our report. Or if we think it best to disregard 15% at each end, we will report the distance between the 15th percentile and the 85th percentile, thus focusing on the middle 70% of the measurements. In choosing which pair of percentiles to adopt, we wish to (a) prevent extreme measurements--outliers or deviants--from distorting the picture of variability for the group in general, but at the same time (b) avoid cutting off so many measurements that we end up telling more about central tendency than about dispersion.

One popular version of distance-between-percentiles is the interquartile range, which reports the distance between the 25th percentile (first quartile) and the 75th percentile (third quartile). The interquartile range, therefore, reflects the extent of dispersion among the middle 50% of a distribution's measurements. Sometimes the interquartile range is divided by 2, producing the semi-interquartile range.

It is useful to note that the principal types of central-tendency and variability measures form two families of statistics. First, in the percentiles family the measure of central tendency is the median (50th percentile), while variability is determined by some version of distance between percentiles (including the interquartile and semi-interquartile ranges). Therefore, if the median is used to report the general success of a group of students, the accompanying measure of variability can reasonably be a version of distance between percentiles.

We turn now to the second family of variability measures.

Variance and standard deviation. The research question: How much do measurements or scores in a distribution stretch above and below the mean?

As noted above, calculating the median and allied percentiles consists of counting the number of scores extending from the lowest and highest. In contrast, computing the mean involves totaling all the measurements or scores, then dividing the sum by the number of measurements or scores. Two measures of dispersion related to the mean are ones determined by calculating how far a distribution's scores deviate from the mean. These measures are the variance and standard deviation.

Consider the example of a study of finger dexterity among employees of a clock manufacturer. The variance of a distribution of employee's scores on a finger dexterity test is computed by (a) calculating how far each score deviates from the mean, (b) squaring that deviation, then (c) adding all the squared deviations together. In brief, the variance is the average of the scores' squared distances from the mean. The act of squaring the deviations from the mean has the obvious effect of lending greater weight to deviations as they extend farther away from the mean--providing greater recognition of extreme high and low scores. Once the variance has been calculated, the standard deviation is easy to determine. You simply find the square root of the variance.

Correlation techniques

The research question: The general question that correlation statistics are designed to answer is: When a change occurs in one variable, how much change--if any--occurs in another variable?

The general question, when recast in terms of particular variables, results in such queries as the following:

 

Ethnic status and academic achievement. Are students from certain ethnic groups more successful at academic studies than students from other ethnic groups?

 

 

Television viewing and violent behavior. What is the relationship between the number and kinds of television programs children watch and the number and types of violent acts in which children engage?

 
 

Mothers' intelligence and daughters' intelligence. How closely do mothers' IQs (intelligence quotients) correspond to their daughters' IQs?

 
 
Identical twins' confidence. In pairs of identical twin girls, how does the level of confidence of a typical girl compare with the level of confidence of her twin sister? 

Administrative style and employee satisfaction. Are the administrative styles of factory managers related to the degree of satisfaction expressed by workers who serve under those managers' supervision?

 

A variety of statistical procedures are available for answering such questions. Which procedure will be appropriate in a given case depends on the kind of data found in the variables being compared. Variables appear in two principal forms: (a) as a distribution of scores or measurements ranging from low to high, (b) as a series of rankings. Table 11-1 indicates which correlation process is suited to which pattern of data. Each of the correlation procedures listed in the right-hand column of Table 11-1 is described briefly in the following pages.

Table 11-1
Kinds of Data and Correlation Procedures

Variable A Variable B Correlation Technique
Measurement series Measurement series Pearson product-moment (r)
Ranks Ranks Spearman rank-order (rho or ρ)
Measurement series Dichotomy Biserial (r b r)
Separate categories Separate categories Phi coefficient (ϕ)

Pearson product-moment correlation. The research question: What is the degree of relationship between two variables if each variable consists of measurements along a scale that consists of a series of equal intervals?

In judging whether the Pearson product-moment method (symbolized by the letter r) is appropriate for a given study, the researcher needs to decide whether both of the variables being compared represent equal-interval scales. Strictly speaking, an equal-interval scale is one in which the distance from one step to the next step is precisely the same throughout the entire length of the scale. Measures of height fulfill this equal-interval requirement, since the distance from 25 centimeters to 30 centimeters is exactly the same as from 60cm to 65cm or from 105cm to 110cm, and so on. Measures of temperature, of weight, of time, of speed, of dollars, and of distance on a running track also involve equal-unit scales. In contrast, some scales used in the social sciences may appear at first glance to be composed of equal intervals, but upon closer inspection it becomes clear that they are not. The scoring of intelligence test results in terms of IQ levels is a familiar example. The difference in intellectual ability between an IQ of 100 and one of 105 is not the same as the difference between 120 and 125, because the questions that make up the test are not all of equal difficulty. Test items that differentiate between IQs 120 and 125 are probably more demanding than those that distinguish between IQs 100 and 105. Likewise, a distribution of scores on a test of English usage or of science facts does not produce an equalinterval scale in the strict meaning of the term, because the items that differentiate between scores 80 and 90 are likely more difficult than those that differentiate between scores of 30 and 40. In effect, a 10-point difference in one segment of the scale is not equal to a 10-point difference in another segment.

This distinction between truly equal-interval scales and scales that contain intervals that are only approximately equal has caused some critics to condemn the use of the Pearson method in a great many of the studies that have employed the procedure. (More than 90% of all correlation coefficients reported in the research literature within the behavioral sciences are Pearsonr's.) However, Heermann and Braskamp ( 1970, pp. 30-110) analysis of a host of investigations suggests that using Pearson's technique with variables involving intervals that are no more than approximately or partially equal is still warranted. Glass and Hopkins ( 1984, p. 9) have observed that the critics' "disenchantment with the classical methods was premature." Thus, it is generally acceptable to apply Pearson's r in studies whose variables consist of test scores or involve ratings of performance or of attitude.

Spearman rank correlation. The research question: What is the degree of relationship between two variables if each variable consists of ranks along a scale rather than measured intervals?

Frequently research data are not in the form needed to compute a Pearsonr, because one or both of the variables consist of ranks instead of measured amounts. Sometimes data are originally collected as rankings, as when teachers are ranked in terms of popularity with students, basketball players ranked by overall ability, and nations ranked by the prestige of their higher-education systems. Other times the data are collected as quantities which are then converted into ranks for convenience of computation. Thus, provinces can be ranked by per-pupil expenditures, colleges by their graduation rates, and students by their grade point averages. Probably the most popular rank-order correlation statistic is Spearman's rho (ρ).

Biserial correlation. The research question: What is the degree of relationship between two variables if one is measured in a graduated fashion so as to produce a sequence of quantities and the other variable is in the form of a dichotomy?

Two computational techniques for determining the association between such variables are the biserial and point-biserial methods. Each yields correlation coefficients that are estimates of what the Pearson r would be if both variables were normally distributed arrays rather than one of them being in the form of a dichotomy.

Deciding whether the biserial technique is appropriate in a given research situation depends on the researcher's assumption about the nature of the dichotomous variable. The biserial method is not appropriate in cases of true dichotomies, such as sex (male/female) or employees' attendance at work on a particular day (present/absent). However, it is applicable when the dichotomy appears to be an artifact of crude measurement. For instance, in a survey of parents' opinions about teaching birth control methods in high school, data may be collected in the form of a dichotomy (agree/disagree). But it is likely that parents' opinions are actually far more varied than the resulting data suggest-- some parents will have strong objections to birth control instruction, some will disagree moderately, others will object mildly, some will agree but with serious reservations, and so on. If a more precise scaling approach had been used in gathering opinions, the results would have assumed the form of a distribution of graduated steps. Thus, the dichotomized variable in this instance was not truly discrete. Therefore, it is this latter type of spurious, crude-measurement dichotomy for which the biserial correlation technique is designed.

On the other hand, if the dichotomous feature is truly discrete (male/female, citizen/alien, fourth-grade pupil/non-fourth-grade pupil), an estimate of r can still be obtained by applying the point-biserial method.

Phi coefficient. The research question: What is the degree of relationship between two variables if both of them consist of sets of discrete categories?

The phi (φ) coefficient is the product-moment correlation between two variables when each variable is scored as discrete points rather than as a series of measured steps. For example, imagine that we wish to determine among political party activists the direction and degree of relationship on a given work day between two variables: (a) the activists' marital status and (b) promptness. We identify two types of marital status (married and single) and two levels of promptness in arriving at work (on-time and late). We then construct a two-bytwo table with the marital-status variable on the horizontal axis and promptness variable on the vertical axis. We can now enter data about the relationship between an individual's marital condition and promptness into the four cells of our two-by-two table. This then permits us to computer a phi coefficient reflecting the degree of relationship between our two variables.

The data used in computing phi coefficients need not be restricted to two discrete positions on each of the variables. For instance, comparing college students' class levels (frosh, soph, junior, senior) and those students' use of alcohol during a given week (did drink vs. did not drink) would produce a 4-by-2 table.

In like manner, other variables represented in discrete types or steps could produce larger size tables--4-by-4, 3-by-6, and such.

Other correlation options. The correlation methods described in the above paragraphs are only four of the more commonly used techniques. Numerous other approaches found in statistics textbooks and journal articles are designed to suit additional conditions of the data that a researcher has at hand. For instance, in some situations the relationship between two series of measures may not assume the shape of a straight diagonal line. As scores on one variable increase, the scores on the other do not increase regularly in a similar manner. Such a relationship results if people's ages over their life span are compared with their eyehand coordination scores. Whereas age increases in regular steps, eye-hand skills do not; instead, such skills increase in early life, remain at a high level for much of adulthood, then decline in old age, thereby rendering their progression curvilinear. In such cases, an eta (η) coefficient can be computed to reflect the association between the variables. As a second example, under certain conditions a tetrachoric coefficient (r t ) rather than a phi coefficient can usefully be calculated to determine the magnitude of the relationship between two variables, each of which is a dichotomy.

The term factor analysis identifies several alternative procedures for estimating which features are common to a series of correlations that have been computed from a variety of measures of a group of individuals. For example, a large number of students can be administered tests intended to assess their mental abilities. Correlations can then be computed to determine which test items are highly related to each other and which ones appear to be mainly independent of each other. The assumption is that when certain items are closely associated (so that students who do well on one item in the cluster also do well on the others, and vice versa), a particular mental ability or mental factor underlies that group of items. Typically, a label is assigned to that cluster of closely related items, with the label intended to reflect the cognitive skill--or factor--that binds the cluster together. For example, the labels applied to factors found in such test batteries as the Primary Mental Abilities ( Thurstone, 1938) and Differential Aptitude Tests ( Bennett, Seashore, & Wesman, 1952) are number comprehension, verbal reasoning, verbal comprehension, abstract reasoning, clerical speed and accuracy, mechanical reasoning, space relations, language usage, and word fluency.

In applications in education, factor analytic studies have been undertaken in such diverse areas as prose style, administrative behavior, occupational classification, attitudes and belief systems, and the economics of education. The technique is still in extensive use in the exploration of abilities, in the refining of tests and scales, and in the development of composite variables for use in research studies. Its most promising applications in recent years, however, have been concerned with the testing of explicit hypotheses about the structure of sets of variables, as in the study of growth models. . . . It has also facilitated the comparison of the factorial structure of different subpopulations, allowing investigators to determine whether the factorial structure of a given set of variables varies, for example, with sex, age, ethnicity, socioeconomic status, or political affiliation. ( Spearritt, 1985, pp. 1822-1823)

Drawing Inferences from Samples

As mentioned earlier, descriptive statistics summarize in a concise form the results of measurements of a group of individuals or events. Sometimes researchers are interested only in what such statistics tell about that group. However, other times they want to apply the group's results to a larger population. In other words, as described in Chapter 7, the measured group is considered to be a sample of a larger population that has not been measured. Hence, from testing the reading ability of 200 nine-year-olds, an investigator may intend to draw inferences about the reading skills of all of a city's or state's nine-year-olds. From a statistical summary of 350 religious workers' expressed attitudes about the use of marijuana, a researcher may hope to estimate the attitudes toward illicit drugs of all such religious workers. However, extending the conclusions about a tested group to a larger population always entails a risk of error, since the sample group may not truly represent the larger population. In effect, the sample may be biased. Therefore, it's important for researchers to have ways of judging how likely the statistics gathered about a sample will accurately portray the features of an intended population. Or, stated as a question, what is the probability of making an error when using descriptive statistics as the basis for drawing inferences about a population? The procedures for answering such a question are called inferential statistics.

It's useful at this point to consider the sources of errors that may distort the conclusions drawn from assessing people or events. In the case of descriptive statistics, inaccurate conclusions derive from measurement errors. For instance, the purpose of having students take a history test is to discover precisely their knowledge of historical facts, concepts, trends, theories, and the like. However, various kinds of errors can render the assessment inaccurate. The directions for taking the test may be unclear, some test items may be badly phrased, noises in the classroom may disrupt students' attempt to concentrate, the time to complete the test may be too short, the tester's method of correcting the students' answers may be faulty, and more. Such measurement errors can be reduced by careful attention to the preparation of the test, to the manner of administering it, and to the method of correcting it. However, if the results of testing a sample of students are used as the foundation for drawing inferences about the broader population of students from which the sample was drawn, another source of inaccuracy can distort the inferred picture of the population's knowledge of history. That source is sampling error, meaning the degree to which inferences about a population likely deviate from the true characteristics of that population. The following discussion identifies two popular statistical procedures for estimating the magnitude of sampling error.

As noted, researchers can never know for sure how accurately a sample drawn from a population reflects the characteristics of that population. For instance, assume that you conduct telephone interviews with 100 consumers to learn which TV programs they prefer, and you compute the percentages of your respondents who prefer various programs. What you now want to know is how likely those percentages are an accurate reflection of the preferences of all 500,000 residents of the city in which you are conducting your survey. The only way you can know for sure the accuracy of your results is to interview all 500,000. But since interviewing the entire population would be impractical, the best you can do is to estimate the probability that the sample percentages are close to the population's true percentages. Inferential statistics are designed to furnish that estimate. We will briefly inspect two of the ways to arrive at such estimates--the t-test and the analysis of variance.

The t-test

Researchers often compare two groups in terms of their means. If the means are found to differ, the question arises: Does each group represent a different population in relation to the characteristic that was measured, so the difference in these sample means reflects an actual difference in the means of the underlying populations? Or are the two groups simply two slightly biased samples from the same population, whose true mean we really don't know? To illustrate, imagine that 50 women and 50 men are enrolled in a college class entitled "Methods of Logic." On the final test at the end of the semester, the mean for the women is 83.6 and for the men 78.9. We may now ask whether these scores reflect a difference only between female and male members of that particular class, or is the population of the kind of college women who enrolled in the class generally more adept at learning the methods of logic taught in the class than is the population of the kind of college men who enrolled? The t-test provides an estimated answer to this query.

By applying the appropriate computation procedure (found in nearly any statistics textbook), we learn that there apparently is less than 1 chance in 1000 that the two groups represent the same population and that the obtained means are different simply because of bias in drawing the samples. In other words, our results support the conclusion that the population of women (of the kind enrolled in the logic class) is on the average somewhat more skilled at learning the methods of logic taught in the class than is the population of men (of the kind that enrolled). There is a 999 chance in 1,000 that this conclusion is warranted.

However, if the means for women in our hypothetical logic class had been 81.0 and the men 83.6 (with the standard deviations σ = 6.7 and σ = 6.3), we would learn that there are likely 5 chances out of 100 that there is no real difference in the means of the populations from which these women and men were drawn. In effect, there are 5 chances in 100 that the difference between 81.0 and 83.6 is simply the result of sampling error--the men's sample just happened to include more adept logic learners--and that both the men and women represent the same population in terms of ability to master the logic techniques taught in the class. But there are 95 chances in 100 that the obtained differences actually do reflect a difference that would be found in the mean scores of the two populations of the kinds of women and men who took the test.

Thus, the t-test is designed to help researchers estimate the probability that measures of a sample of people or events accurately portray the broader populations of people or events from which the sample was apparently drawn. In addition to testing the representativeness of obtained means, there are t-tests for pairs of medians, percentages, standard deviations, and correlations.

In the above brief sketch of the t-test procedure, we have not taken the space to point out several important assumptions about the way samples are drawn from populations, assumptions that significantly affect the appropriateness of t-tests in particular studies. For explanations of those assumptions, readers are directed to the suggested readings at the end of this section.

Analysis of variance

As explained earlier, the variance (σ2) is a description of how much measurements spread away from the center of a distribution. Specifically, the variance is the average of the squared measurement deviations from the mean.

We have seen that the t-test is used to estimate whether the means from two samples represent the same population or two different populations. The analysis of variance (ANOVA) is a procedure for simultaneously testing how likely three or more means represent samples drawn from the same population or, in contrast, are means representing different populations. One example of comparing three or more means is found in attitudes toward a birth control methods as expressed by parents, teachers, police officers, and teenagers. Another example is found in a study of mathematics test scores of high school students representing six ethnic groups--Anglos, Latinos, Afro-Americans, Asians, Native Americans, and Pacific Islanders.

Not only does ANOVA permit the simultaneous comparison of multiple means, but the results are more accurate than if t-tests were applied to each pairing of the multiple means being studied. Glass and Hopkins ( 1984, p. 324) point out that "ANOVA is the most common of all inferential statistical techniques in education and the behavioral sciences."

ANOVA results are interpreted in much the same way as those of the t-test, that is, in terms of the probability that a difference between sample means are the result of sampling error rather than the result of a difference in the true means of the populations from which the compared samples were drawn. Thus, a difference among sample means that could occur by chance (by sampling error) at a probability level of only 1 time in 100 gives the researcher more confidence in believing that the means of the represented populations are truly different than does a difference among sample means that could occur by chance 5 times in 100 or 10 times in 100.ANOVA can also be extended to test the likelihood of interactions among factors. For instance, one researcher used ANOVA to discover whether teachers' ethnic status affected their perceptions of how adaptable Anglo and Latino students were. The results showed that there was indeed interaction between teacher and student ethnic types. Latino teachers more often judged Latino students as more adaptive, whereas Anglo teachers more frequently considered Anglo students more adaptive ( Glass & Hopkins, 1984, p. 404).

Additional options

In the foregoing pages we have introduced a few types of statistics commonly used in research and have suggested which of the types are most useful for answering various kinds of research questions. There are, however, far more statistical procedures than those reviewed in this chapter, procedures well suited to answering additional kinds of research questions. Such additional types of statistical treatments are identified by such titles as: chi square, partial correlation, one-way analysis of variance, two-way analysis of variance, the analysis of covariance, linear and nonlinear regression, Kendall's tau coefficient, Kendall's coefficient of concordance, the median test, the Mann-Whitney U test, the WoldWolfowitz runs test, and others.Descriptions of a wide range of statistical procedures, as well as their computational steps, are found in such sources as the following.

 

 Glass G. V., & Hopkins K. D. ( 1996). Statistical Methods in Education and Psychology ( 3rd ed.). Boston: Allyn & Bacon.
 Gravetter F. J. ( 1988). Statistics for the Behavioral Sciences. St. Paul, MN: West.
 Hays W. L. ( 1994). Statistics ( 5th ed.). Fort Worth, TX: Harcourt Brace.
 Jaccard J., & Becker M. A. ( 1990). Statistics for the Behavioral Sciences ( 2nd ed.). Belmont, CA: Wadsworth.
 Popham W. J., & Sirotnkik K. A. ( 1992). Understanding Statistics in Education. Itasca, IL: Peacock.
 Siegel S., & Castellan N. J., Jr. ( 1988). Nonparametric Statistics for the Behavioral Sciences ( 2nd ed.). New York: McGraw-Hill.
 Sirkin R. M. ( 1995). Statistics for the Social Sciences. Thousand Oaks, CA: Sage.
 

Sprinthall R. C. ( 1997). Basic Statistical Analysis ( 5th ed.). Boston, MA: Allyn & Bacon.

Getting Help with Computation and Interpretation

 A problem frequently faced by students whose thesis or dissertation research involves the analysis and summarization of quantitative data is that the students lack the statistical expertise needed for selecting the most appropriate modes of analysis, for organizing their data in a form well suited to the chosen mode, for carrying out the steps of computation, and for displaying and interpreting the results.Your first step toward solving such statistical problems is that of stating the question--or series of questions--you hope to answer with the numerical data you plan to collect. Our saying "plan to collect" is intended to suggest that you are best prepared if you select your statistics at the time you devise your research design, that is, before you actually gather your data. By thus planning ahead, you better ensure that your data will be compiled in a form well suited to the statistical treatment you will ultimately apply. However, it is not uncommon for students to collect their quantitative data before they choose their type of statistics. It is also the case that additional research questions may arise during the data-gathering stage so that the statistical treatment needed for answering those further questions could not have been anticipated at the research-design stage. Under either of these post-data-collection circumstances, the statistical decisions will be made during or after the information has been collected.So, with your questions in hand, you face the second step of the process--that of selecting modes of statistical analysis that will yield convincing answers to such questions. At this point you may find that you need help. Your feasible options include:
Inspecting scholarly journals to locate published studies that were guided by questions similar to your own questions. You can then analyze the statistical treatments used in those studies to decide whether you might profitably adopt the same approaches.
Seeking the aid of fellow graduate students skilled in statistical applications.
Asking the advice of faculty members who are experts in research design and statistical analysis.
Searching through statistics books, such as those listed on page 206, to find methods suited to your questions.
These same sources of help can be useful in taking the subsequent steps of casting the data in an appropriate form, conducting the calculations, and displaying and interpreting the outcomes. At the stage of carrying out the calculations, you can profitably avail yourself of the statistical programs available for both personal and mainframe computers. Calculations which, in the past, were laboriously done by hand with the aid of a calculator can now be performed flawlessly and within a few seconds, simultaneously yielding a variety of types of statistics. The following are examples of statistical packages for use with personal computers.  
 

 

 

 

 

 

 

 
 

 

 

 

 
 
< Prev   Next >

Service features

24/7 customer support

Written from scratch papers only

Any citation style

Fully referenced

Never resold papers

275 words per page Courier New font