Hypothesis Testing and P-values

Lucy D’Agostino McGowan

Hypothesis testing

Null hypothesis: \(\beta_1 = 0\)
Alternative hypothesis: \(\beta_1 \neq 0\)

Under the null hypothesis

Under the null \((\beta_1 = 0)\) the t-statistic \((\hat\beta_1/se_{\hat\beta_1})\) has a t-distribution with \(n-2\) degrees of freedom.

Code

null <- tibble(
  t = rt(10000, df = 100)
)

ggplot(null, aes(t)) +
  geom_histogram(bins = 30)

Example

What t statistic did we observe?

Code

mod <- lm(battery_percent ~ screen_time, data = data_sample)
summary(mod)


Call:
lm(formula = battery_percent ~ screen_time, data = data_sample)

Residuals:
    Min      1Q  Median      3Q     Max 
-48.166 -16.846   1.795  18.387  51.194 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 86.23349   11.85438   7.274  2.4e-08 ***
screen_time -0.10986    0.04851  -2.265   0.0302 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.36 on 33 degrees of freedom
Multiple R-squared:  0.1345,    Adjusted R-squared:  0.1083 
F-statistic: 5.129 on 1 and 33 DF,  p-value: 0.03022

Under the Null Hypothesis

Code

library(geomtextpath)
ggplot(null, aes(t)) +
  geom_histogram(bins = 30) + 
  geom_textvline(xintercept = c(2.265), label = "observed t statistic") + 
  geom_textvline(xintercept = c(-2.265), label = "observed t statistic (flipped)")

How do we compare these to the distribution under the null?

p-value

The probability of observing a statistic as extreme or more extreme than the observed test statistic given the null hypothesis is true

Under the Null Hypothesis

Code

null$color <- ifelse(null$t < 2.265 & null$t > -2.265, "out", "in")
ggplot(null, aes(t, fill = color)) +
  geom_histogram(bins = 30) + 
  geom_textvline(xintercept = c(2.265), label = "observed t statistic") + 
  geom_textvline(xintercept = c(-2.265), label = "observed t statistic (flipped)") + 
  theme(legend.position = "none")

Under the Null Hypothesis

The proportion of area greater than 2.265

Code

null$color <- ifelse(null$t < 2.265 & null$t > -2.265, "out", "in")
ggplot(null, aes(t, fill = color)) +
  geom_histogram(bins = 30) + 
  geom_textvline(xintercept = c(2.265), label = "observed t statistic") + 
  geom_textvline(xintercept = c(-2.265), label = "observed t statistic (flipped)") + 
  theme(legend.position = "none")

pt(2.265, df = 35-2, lower.tail = FALSE)

[1] 0.01510181

Under the Null Hypothesis

The proportion of area less than -2.265

Code

null$color <- ifelse(null$t < 2.265 & null$t > -2.265, "out", "in")
ggplot(null, aes(t, fill = color)) +
  geom_histogram(bins = 30) + 
  geom_textvline(xintercept = c(2.265), label = "observed t statistic") + 
  geom_textvline(xintercept = c(-2.265), label = "observed t statistic (flipped)") + 
  theme(legend.position = "none")

pt(-2.265, df = 35-2)

[1] 0.01510181

Under the Null Hypothesis

The proportion of area greater than 2.265 or less than -2.265

Code

null$color <- ifelse(null$t < 2.265 & null$t > -2.265, "out", "in")
ggplot(null, aes(t, fill = color)) +
  geom_histogram(bins = 30) + 
  geom_textvline(xintercept = c(2.265), label = "observed t statistic") + 
  geom_textvline(xintercept = c(-2.265), label = "observed t statistic (flipped)") + 
  theme(legend.position = "none")

pt(2.265, df = 35-2, lower.tail = FALSE) + pt(-2.265, df = 35-2)

[1] 0.03020362

Under the Null Hypothesis

The proportion of area greater than 2.265 or less than -2.265

Code

null$color <- ifelse(null$t < 2.265 & null$t > -2.265, "out", "in")
ggplot(null, aes(t, fill = color)) +
  geom_histogram(bins = 30) + 
  geom_textvline(xintercept = c(2.265), label = "observed t statistic") + 
  geom_textvline(xintercept = c(-2.265), label = "observed t statistic (flipped)") + 
  theme(legend.position = "none")

pt(2.265, df = 35-2, lower.tail = FALSE) * 2

[1] 0.03020362