When analyzing information, whether it’s the performance a group of students displays on an academic test, the quality of a batch of beer, or the health of patients before and after a medical treatment, simply glancing at numbers is rarely sufficient to understand your data.
Statistics help us move beyond our intuition and gut reaction, to test whether a pattern of information is truly useful and reliable, or just a bunch of numbers. One of the more basic, but very powerful, statistical tests is the t-test, or Student’s t-test.
And one useful variation of that test is known as the paired t-test, also called within-subjects t-test, or dependent t-test.
The paired t-test is useful when you compare data points from related, or paired samples, as opposed to two sets of data that are not related. One of the most common uses of a paired t-test is comparing measurements taken from the same subjects twice, with the before and after scores being the paired data.
This article will help you understand what a t-test is, what a paired t-test is, and some of the basic uses and concepts involved. While we will not get too far into the actual formulas or programs to perform a paired t-test, this will be a helpful primer to getting a basic grasp of the tool, meant for those with limited knowledge in statistics. We’ll avoid the dizzying flurry of numbers and variables that wouldn’t be too helpful for a novice.
There are, however, a number of resources at the end of this article for diving deeper into the paired t-test, and some tutorials for how to run one.
A coin toss, better beer, and the t-testTo take a quick step back, a t-test is a statistics tool used to compare the reliability of different averages between two groups or sets of data. The classic statistical problem we often start with is the coin toss example. If you and I each flip a coin 100 times, we may get heads a different number of times, say 47 versus 51. You could look at these numbers and surmise that one of us is better at flipping coins. But we know that’s not the case, and a t-test would prove that the differences between our scores are not statistically significant and therefore not reliable information when predicting further outcomes.
The origin of the t-test is actually pretty interesting. The story goes that an employee of Guinness brewery, William Sealy Gosset, developed the method as a way to use sampling to test the quality of entire batches of beer, and gauge the reliability of his results. Because Guinness back in the early 1900s forbade employees to publish its methods, he published under the pseudonym “Student,” which is why we call the tool the “Student’s t-test” today.
A common modern use of a t-test would be comparing an experiment group to a control group. An example is if you had a drug you administered to a small group (as opposed to giving it a huge number without knowing the effects) you could give a placebo to another comparably sized group and see if the impact of the drug was reliable and therefore should be rolled out to a larger group.
P-value and t-valueUsing a t-test, you can calculate both the difference between the groups of data, accounting for how scattered the data are within the groups, but also the likelihood that the difference is reliable. The two values we look at to judge these figures are what we call the t-value and the p-value.
The t-value is the difference between the two samples of data, but it’s a bit more complicated than just looking at one average compared to the other. Because it’s a relatively small set, you have to consider the randomness within each sample. So the t-value is actually defined by how different the two samples are, relative to how different the data points are within each sample. One way to think of this is the true difference compared to the “scatter” of the data, or how strong the signal is compared to the noise. The larger t-value the more different the two groups are.
The p-value represents the dependability of that difference. What it measures is, if we were to take any other sample from the same population, what is the likelihood that we’d find an equal t-value or larger. The higher that p-value, the LESS reliable our data. Briefly, a p-value of .05 means there is a 5 percent chance there is no real difference. A .01 p-value means there is only a 1 percent change there is no reliable difference. Why not just say that? Well, because these values are actually a bit more complicated than that once you get into nitty gritty of these concepts. It’s just easier to boil down the basic concept into these layman’s terms.
There’s one other important value to know about in t-tests and that’s the sample size. As you can imagine, the larger the sample size, generally the better our reliability. In statistics, we call this degrees of freedom, or df. This number in a t-test equals sample size minus 1.
Paired t-testOk now you know the basics of a t-test and some of the key values in play. So what is a paired t-test? Well, in a t-test like we described in the control/experiment sets above, the groups of data we’re comparing are unrelated, or independent. That’s why this is sometimes called an independent-samples t-test. So in the placebo group versus drug group example, the two groups of people we’re measuring have nothing to do with each other.
In a paired t-test, however, each score or data point we’re looking at is paired with another data point. This is often because the two scores are taken from the same subject, as in the case of a before and after test. For example, you could compare the blood pressure before a treatment in one set of patients, to the blood pressure after the treatment in the same set of patients. In this case, the paired data refers to the patients’ two blood pressure scores.
You would still find the t-value to tell you how different the before numbers are compared to the after numbers, and the p-value would again tell you how reliable that difference is.
To use a beer-related example, let’s say we collect a group of Guinness employees and first give them a manual dexterity test. Then we give them each one pint of Guinness and give them the same test. The paired t-test is perfect for this, and is statistically powerful, especially considering the difference between the two sets will likely not be very large. It’s also useful when the differences between the subjects will be large. The more scattered the scores among your subjects will be, the more difficult it will be to use an independent t-test.
For another example, let’s say you have a ball and cup game where a small cup has a string attached with a ball hanging on the end. With one hand you have to swing the ball and get it to land and stay in the cup. You want to see how different the performance is using the right hand versus the left hand. It’s a pretty hard game, so if you have ten subjects, they will likely perform very differently. This is a good problem for a paired t-test, since you have two related sets of data (one score for each hand of each subject).