Chapter 2 Permutation Test
Use the permutation test if the normality assumption is violated, and you’re interested in quantifying the difference in some location parameter: mean, trimmed mean, or median. This works well for smaller sample sizes.
Assumptions:
- Random sample from each population
- Both are sampled independently
- Both population distributions are continuous (not categorical / discrete)
Note: We no longer need the assumption of normality, nor equal variances
How It Works
We’ll use \(D_{obs}\) to represent the difference in means that we observe between our samples. For the following samples, we find \(D_{obs}=12.5\).
| Sample A | 31 | 32 | 34 | 47 | Mean: 36 |
| Sample B | 46 | 48 | 49 | 51 | Mean: 48.5 |
Under the null hypothesis, we’d expect that there is no difference in means. In other words, we could randomly switch around the values across the samples and generate test statistics \(D^*\) for each permutation. If the null hypothesis is false, and the difference we observe can’t be explained by random chance, we’d see that only a few \(D^*\)’s are more extreme than \(D_{obs}\).
The way we do this is fairly straightforward. We’ll first pool together our observed values.
| Pooled | 31 | 32 | 34 | 47 | 46 | 48 | 49 | 51 |
Now we’ll create as many permutations as we can, reassigning the “Sample A” and “Sample B” labels across all the observations. There are \(\binom {m+n}{m} = \binom {8}{4} = 70\) possible permutations. We’ll calculate a distribution of test statistics by finding what \(D^*\) is for each permutation:
| A* | B* | A** | B** | etc | A*** | B*** |
|---|---|---|---|---|---|---|
| 46 | 31 | 46 | 31 | … | 47 | 31 |
| 32 | 48 | 48 | 32 | … | 48 | 32 |
| 34 | 49 | 34 | 49 | … | 49 | 34 |
| 47 | 51 | 47 | 51 | … | 51 | 46 |
| D*= -5 | D*= 3 | … | D*= 13 |
From our calculated \(D^*\)’s, we find that there are only 4 permutations that yield a test statistic greater than \(D_{obs}\). So, our p-value is simply \(\frac{4}{70}= 0.057\).
Formal Definitions
For a double sided test: \[ H_0: F_1(x) = F_2(x) \\ H_a: F_1(x) \neq F_2(x) \\ ~ \\ p\text{-value}_{two\ sided} = \frac{\text{# of |D's|}~\geq~|D_{obs}|}{\binom {m+n}{m}} \] For an upper tail test: \[ H_0: F_1(x) = F_2(x) \\ H_a: F_1(x) \leq F_2(x) \\ ~ \\ p\text{-value}_{upper} = \frac{\text{# of }D\geq D_{obs}}{\binom {m+n}{m}} \]
For a lower tail test: \[ H_0: F_1(x) = F_2(x) \\ H_a: F_1(x) \geq F_2(x) \\ ~ \\ p\text{-value}_{lower} = \frac{\text{# of }D\leq D_{obs}}{\binom {m+n}{m}} \]
Interpretation: Given a p-value of 0.057, there is a 5.7% chance of observing a difference as extreme as we did under the hypothesis that these samples come from populations with the same distribution.
Code
Variants
Instead of difference in means, we could use either (1) sums, (2) trimmed means, or (3) medians:
- Mean/Sum: Use when pop. dist. is short-tailed (normal looking)
- Trimmed Mean: Use when pop. dist. is symmetric but heavy-tailed (some unusually extreme observations are likely)
- Median: Use when population distribution is skewed