Interpreting Statistics - Differences


Differences Between Groups - Using the t-Test

If you have a two-group research study and your groups are fairly large (20 to 100) then you can use the independent means t-test. The t-test is called a parametric test because your data must come from populations that are normally distributed and use interval measurement. The t-test is used to answer to this question: Is there any difference between the means of the two populations of which our data is a random sample? The t-test is also called a test of inference because we are trying to discover if populations are different by studying samples from the populations, i.e., what we find to be true about our samples we will assume to be true about the population.

To compute a t-test we need to obtain the following -

Sample

Size

Degrees of
Freedom

Mean

      Sum of Squares      

1

n1

n1 - 1

_

Y1

_

SS1 = S (Y1i - Y1)2

2

n2

n2 - 2

_

Y2

_

SS2 = S(Y2j - Y2)2

Total

 

n1 + n2 - 2

 

Pooled Sum of Squares = SS1 + SS2

 Note: Y1i and Y2j stand for each individual score of the variables

  1. The standard deviation of the difference between the means is:

  2.  

     

  3. The difference between the means is:

     

  4. Note: If mean Y1 is larger than mean Y2 then the numerator will be negative, and thus the t-value will be negative.

     

  5. The t-value is:

  6.  

If the difference between the means is large in comparison to the standard deviation of the difference between the means, then the t-value is large. The larger the t-value the smaller the probability that the means of the two populations are the same. It does not matter if the t-value is negative or positive.  Use the absolute value (disregard the sign) when interpreting the t-value.

Let's look at an example: Suppose we are studying energy expenditure during ambulation (walking) with ortho crutches and axillary crutches. As a measure of energy expenditure we are using heart rate in beats per minute. Two groups of normal subjects (20 per group) walk at their own pace using crutches for 11.5 minutes. Heart rate is measured at the end of the 11.5 minutes. Group A uses the axillary crutches. Our hypothesis is that mean heart rate of Group B will be higher than Group A. We are assuming that it requires more energy to use the axillary crutch than the ortho crutch.

DATA

Group A (Y1)  Group B (Y2)

120 139 139 130      134 141 155 142
104 141 128 137      123 124 134 140
135 121 134 134      149 150 135 137
137 122 136 132      138 142 148 145
138 140 141 129      127 138 129 147

Using the formula we will solve for the t-value using the data above:

1.Y1 = SY1i  =  2673  = 133.65
     ───   ───
      n1      20
2.Y2 = SY1j  =  2778  = 138.90
     ───   ───
      n1      20
3.SS1 = S(Y1i-Y1)2 = 846.55

4.SS2 = S(Y2j-Y2)2 = 1477.80

 

The standard deviations for each group are obtained by dividing SS1 by 19 (n-1) and SS2 by 19 and then taking the square roots. The standard deviation for Group A (Y1) = 6.67 and for Group B (Y2) = 8.82.

 
5. = 2.47

6. = 2.13

What does a t-value of 2.13 mean? The t-value is an indication of the probability that both populations from which we selected our samples have the same mean and that differences in our sample means are due to random fluctuation. As the t-value gets smaller (approaches zero) the probability that the population means are the same gets larger. As the t-value gets larger (in either the positive or negative direction) the probability that the population means are the same gets smaller.

We can use the t-value to decide between our two statistical hypotheses:

  1. Null hypothesis : The two populations have the same mean

  2. Alternative hypothesis : The population that uses the axillary crutch has a higher heart rate than the ortho crutch.

Generally to feel comfortable in our decision that the means are not the same we would like the t-value to correspond to a probability of 1 in 20 or .05 or smaller. To find out the probability associated with our t-value of 2.13 we use the table below, Critical Values for the t-Distribution. First we must select the correct critical value from the table to compare with our calculated value. We do this by computing the degrees of freedom, where degrees of freedom (df) equals: 20 + 20 - 2 = 38. We go down the df column of Table 9 until we get to the row closest to 38; the row labeled 40 is closest so we will use that row. Go across the row and find the number from the column labeled a = .05. The value is 1.684.

If our computed t-value is the same as or smaller than the tabled t-value, we accept the null hypothesis and conclude that the populations have the same mean. If our t-value is larger, we can accept the alternative hypothesis. Since our t-value (t = 2.13) is larger than the tabled t-value (t = 1.684) this means that there is a small chance (1 in 20) that the population means are the same, and so it is reasonable to conclude that the means are different.

 

Critical Values for the t-distribution

df a = .05

1 6.314
2 2.920
3 2.353
4 2.132
5 2.015
6 1.943
7 1.895
8 1.860
9 1.833
10 1.812
12 1.782
14 1.761
16 1.746
18 1.734
20 1.725
30 1.697
40 1.684
60 1.671

Abridged from Fisher and Yates. Statistical
Tables for Biological, Agricultural, and Medical
Research
. Edinburgh, Oliver and Boyd Limited.

Degrees of freedom are associated with the sums of squares, and can be defined as the number of values squared (n) minus the number of independent linear restrictions imposed on the data. In this case we have computed the sums of the squared deviations from the mean for each group. Only n-1 of these squared deviations are independent and when we have computed n-1 of them, the nth one is pre-determined because the sum of the deviations is zero.

Another t-test is available when you have one-group and observe the continuous, dependent variable twice. This is the dependent means t-test (or matched pairs t-test). The word dependent means that the second observation is related to the first since the same group is being measured twice. This is different from the independent means t-test where two different measured once. Use the equation below to compute the t-value. Interpretation is the same as for the independent means t-test.

Where, D is the mean difference between the two observations

Sd is the standard deviation of the differences, and

n is the number of subjects.

Statistical Tests and Probability

An understanding of probability is essential to an understanding of that statistical tests mean. A common situation with a statistical test is this: we want to know if the means from two groups on the same dependent variable are the same or different. The statistical test will tell us the probability that the two means are the same.

Suppose we draw random samples from a population and study the average value of a variable. We would find that most of the time the means of those samples would be almost the same, and sometimes they would be very different. This difference would be a random event. When we do research we expect the independent variable to cause the mean of the dependent variable to be different for each group. A statistical test tells us the probability that the difference we find is due to a random event rather than due to the independent variable.

The graph below shows the distribution of 80,788 t-tests run on 80,788 pairs of groups of twenty random numbers. The t-test values from our random distribution range from -3 to +3 and indicate the size of the difference between the two groups. The majority of the t-test values are near zero.

Distribution of t-values

 

Now let us gather some experimental data on two groups, and compute the t-value. If our experimental t-value is small, then the probability that it occurred by chance is large, because our random distribution of 80,788 t-tests has thousands of small t-values. If our t-value is large then the probability that it occurred by chance is small, because there are few large t-values in the random distribution.

By counting the number of t-values in the random distribution that are larger than a particular t-value, and dividing by the total number of t-values (80,788), we can compute the probability that our experimental t-value was due to chance. The table below does this when t = 1.7 and 2.5:

    t-value    Freq. Above     Prob.  
1.74363.05
2.5793.01

The various tables of critical values (t-test, correlation, chi-square) are summaries of many distributions like our t-value random distribution for particular probability levels (.10, .05, and .01).