Logistic Regression Inference

Lucy D’Agostino McGowan

Hypothesis test

  • null hypothesis \(H_0: \beta_1 = 0\)
  • alternative hypothesis \(H_A: \beta_1 \ne 0\)

Logistic regression test statistic



\[z = \frac{\hat\beta_1}{SE_{\hat\beta_1}}\]

Logistic regression test statistic

How is this different from the test statistic for linear regression?

\[z = \frac{\hat\beta_1}{SE_{\hat\beta_1}}\]

Logistic regression test statistic

How is this different from the test statistic for linear regression?

\[\color{red}z = \frac{\hat\beta_1}{SE_{\hat\beta_1}}\]

  • The \(z\) denotes that this is a \(z\)-statistic
  • What does this mean? Instead of using a \(t\) distribution, we use a normal distribution to calculate the confidence intervals and p-values

Logistic regression confidence interval

What do you think goes in this blank to calculate a confidence interval (instead of \(t^*\) as it was for linear regression)?

\[\hat\beta_1 \pm [\_^*] SE_{\hat\beta_1}\]

Logistic regression confidence interval

What do you think goes in this blank to calculate a confidence interval (instead of \(t^*\) as it was for linear regression)?

\[\hat\beta_1 \pm [\color{red}z^*] SE_{\hat\beta_1}\]

  • \(z^∗\) is found using the normal distribution and the desired level of confidence
qnorm(0.975)
[1] 1.959964

Logistic regression confidence interval

Where are my degrees of freedom when calculating \(z^*\)?

\[\hat\beta_1 \pm [\color{red}z^*] SE_{\hat\beta_1}\]

  • \(z^∗\) is found using the normal distribution and the desired level of confidence
qnorm(0.975)
[1] 1.959964
  • The normal distribution doesn’t need to know your sample size but it does rely on reasonably large sample

Logistic regression confidence interval

  • \(\hat\beta_1\) measures the change in log(odds) for every unit change in the predictor. What if I wanted a confidence interval for the odds ratio?

\[\hat\beta_1 \pm [\color{red}z^*] SE_{\hat\beta_1}\]

Logistic regression confidence interval

How do you convert log(odds) to odds?

  • \(\hat\beta_1\) measures the change in log(odds) for every unit change in the predictor. What if I wanted a confidence interval for the odds ratio?

\[\hat\beta_1 \pm [\color{red}z^*] SE_{\hat\beta_1}\]

Logistic regression confidence interval

How do you convert log(odds) to odds?

  • \(\hat\beta_1\) measures the change in log(odds) for every unit change in the predictor. What if I wanted a confidence interval for the odds ratio?

\[e^{\hat\beta_1 \pm [\color{red}z^*] SE_{\hat\beta_1}}\]

Let’s try it in R!

We are interested in the relationship between Backpack weight and Back problems.

library(Stat2Data)
data("Backpack")
mod <- glm(BackProblems ~ BackpackWeight, 
    data = Backpack, 
    family = "binomial")

exp(coef(mod))
   (Intercept) BackpackWeight 
     0.2805017      1.0444660 
  • How do you interpret the Odds ratio?
    • A one unit increase in backpack weight yields a 1.04-fold increase in the odds of back problems

Let’s try it in R!

confint(mod)
                     2.5 %     97.5 %
(Intercept)    -2.28602740 -0.3214891
BackpackWeight -0.02912583  0.1180606
exp(confint(mod))
                   2.5 %    97.5 %
(Intercept)    0.1016696 0.7250685
BackpackWeight 0.9712942 1.1253124
  • How do you interpret the Odds ratio?
    • A one unit increase in backpack weight yields a 1.04-fold increase in the odds of back problems
  • What is my null hypothesis?
    • \(H_0:\beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • What is the result of this hypothesis test at the \(\alpha = 0.05\) level?

Log Likelihood

  • “goodness of fit” measure
  • higher log likelihood is better
  • Both AIC and BIC are calculated using the log likelihood
    • \(\Large f(k) - 2 \log\mathcal{L}\)
  • \(\color{red}{- 2 \log\mathcal{L}}\) - this is called the deviance
  • Similar to the nested F-test in linear regression, in logistic regression we can compare \(-2\log\mathcal{L}\) for models with and without certain predictors
  • \(-2\log\mathcal{L}\) follows a \(\chi^2\) distribution with \(n - k - 1\) degrees of freedom.
  • The difference \((-2\log\mathcal{L}_1)-(-2\log\mathcal{L}_2)\) follows a \(\chi^2\) distribution with \(p\) degrees of freedom (where \(p\) is the difference in the number of predictors between Model 1 and Model 2)

Likelihood ratio test

  • For example, if we wanted to test the following hypothesis:
    • \(H_0: \beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • We could compute the difference between the deviance for a model with \(\beta_1\) and without \(\beta_1\).
    • Model 1: \(log(odds) = \beta_0\)
    • Model 2: \(log(odds) = \beta_0 + \beta_1x\)

Likelihood ratio test

Are these models nested?

  • For example, if we wanted to test the following hypothesis:
    • \(H_0: \beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • We could compute the difference between the deviance for a model with \(\beta_1\) and without \(\beta_1\).
    • Model 1: \(log(odds) = \beta_0\)
    • Model 2: \(log(odds) = \beta_0 + \beta_1x\)

Likelihood ratio test

What are the degrees of freedom for the deviance for Model 1?

  • For example, if we wanted to test the following hypothesis:
    • \(H_0: \beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • We could compute the difference between the deviance for a model with \(\beta_1\) and without \(\beta_1\).
    • Model 1: \(log(odds) = \beta_0\)
    • Model 2: \(log(odds) = \beta_0 + \beta_1x\)

Likelihood ratio test

What are the degrees of freedom for the deviance for Model 1?

  • For example, if we wanted to test the following hypothesis:
    • \(H_0: \beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • We could compute the difference between the deviance for a model with \(\beta_1\) and without \(\beta_1\).
    • Model 1: \(log(odds) = \beta_0\) ➡️ \(-2\log\mathcal{L}_1\), df = \(n-1\)
    • Model 2: \(log(odds) = \beta_0 + \beta_1x\)

Likelihood ratio test

What are the degrees of freedom for the deviance for Model 2?

  • For example, if we wanted to test the following hypothesis:
    • \(H_0: \beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • We could compute the difference between the deviance for a model with \(\beta_1\) and without \(\beta_1\).
    • Model 1: \(log(odds) = \beta_0\) ➡️ \(-2\log\mathcal{L}_1\), df = \(n-1\)
    • Model 2: \(log(odds) = \beta_0 + \beta_1x\)

Likelihood ratio test

What are the degrees of freedom for the deviance for Model 2?

  • For example, if we wanted to test the following hypothesis:
    • \(H_0: \beta_1 = 0\)
    • \(H_A: \beta_1 \neq 0\)
  • We could compute the difference between the deviance for a model with \(\beta_1\) and without \(\beta_1\).
    • Model 1: \(log(odds) = \beta_0\) ➡️ \(-2\log\mathcal{L}_1\), df = \(n-1\)
    • Model 2: \(log(odds) = \beta_0 + \beta_1x\) ➡️ \(-2\log\mathcal{L}_2\), df = \(n-2\)

Likelihood ratio test

  • We are interested in the “drop in deviance”, the deviance in Model 1 minus the deviance in Model 2

\[(-2\log\mathcal{L}_1) - (-2\log\mathcal{L}_2)\]

Likelihood ratio test

What do you think the degrees of freedom are for this difference?

  • We are interested in the “drop in deviance”, the deviance in Model 1 minus the deviance in Model 2

\[(-2\log\mathcal{L}_1) - (-2\log\mathcal{L}_2)\]

  • df: \((n-1) - (n-2) = 1\)

Likelihood ratio test

What is the null hypothesis again?

  • We are interested in the “drop in deviance”, the deviance in Model 1 minus the deviance in Model 2

\[(-2\log\mathcal{L}_1) - (-2\log\mathcal{L}_2)\]

☝️ test statistic

  • df: \((n-1) - (n-2) = 1\)

Likelihood ratio test

How do you think we compute a p-value for this test?

  • We are interested in the “drop in deviance”, the deviance in Model 1 minus the deviance in Model 2

\[(-2\log\mathcal{L}_1) - (-2\log\mathcal{L}_2)\]

☝️ test statistic

  • df: \((n-1) - (n-2) = 1\)
pchisq(L_0 - L, df = 1, lower.tail = FALSE)

Let’s try it in R!

data(MedGPA)
glm(Acceptance ~ GPA, data = MedGPA, family = "binomial") |>
  summary()

Call:
glm(formula = Acceptance ~ GPA, family = "binomial", data = MedGPA)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7805  -0.8522   0.4407   0.7819   2.0967  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -19.207      5.629  -3.412 0.000644 ***
GPA            5.454      1.579   3.454 0.000553 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 75.791  on 54  degrees of freedom
Residual deviance: 56.839  on 53  degrees of freedom
AIC: 60.839

Number of Fisher Scoring iterations: 4

Let’s try it in R!

What is the “drop in deviance”?

data(MedGPA)
glm(Acceptance ~ GPA, data = MedGPA, family = "binomial") |>
  summary()

Call:
glm(formula = Acceptance ~ GPA, family = "binomial", data = MedGPA)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7805  -0.8522   0.4407   0.7819   2.0967  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -19.207      5.629  -3.412 0.000644 ***
GPA            5.454      1.579   3.454 0.000553 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 75.791  on 54  degrees of freedom
Residual deviance: 56.839  on 53  degrees of freedom
AIC: 60.839

Number of Fisher Scoring iterations: 4

  • 75.8 - 56.8 = 19

Let’s try it in R!

What are the degrees of freedom for this difference?

data(MedGPA)
glm(Acceptance ~ GPA, data = MedGPA, family = "binomial") |>
  summary()

Call:
glm(formula = Acceptance ~ GPA, family = "binomial", data = MedGPA)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7805  -0.8522   0.4407   0.7819   2.0967  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -19.207      5.629  -3.412 0.000644 ***
GPA            5.454      1.579   3.454 0.000553 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 75.791  on 54  degrees of freedom
Residual deviance: 56.839  on 53  degrees of freedom
AIC: 60.839

Number of Fisher Scoring iterations: 4

  • 75.8 - 56.8 = 19

Let’s try it in R!

What are the degrees of freedom for this difference?

data(MedGPA)
glm(Acceptance ~ GPA, data = MedGPA, family = "binomial") |>
  summary()

Call:
glm(formula = Acceptance ~ GPA, family = "binomial", data = MedGPA)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7805  -0.8522   0.4407   0.7819   2.0967  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -19.207      5.629  -3.412 0.000644 ***
GPA            5.454      1.579   3.454 0.000553 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 75.791  on 54  degrees of freedom
Residual deviance: 56.839  on 53  degrees of freedom
AIC: 60.839

Number of Fisher Scoring iterations: 4

  • 75.8 - 56.8 = 19
  • df: 1

Let’s try it in R!

What is the result of the hypothesis test? How do you interpret this?

data(MedGPA)
glm(Acceptance ~ GPA, data = MedGPA, family = "binomial") |>
  summary()

Call:
glm(formula = Acceptance ~ GPA, family = "binomial", data = MedGPA)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7805  -0.8522   0.4407   0.7819   2.0967  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -19.207      5.629  -3.412 0.000644 ***
GPA            5.454      1.579   3.454 0.000553 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 75.791  on 54  degrees of freedom
Residual deviance: 56.839  on 53  degrees of freedom
AIC: 60.839

Number of Fisher Scoring iterations: 4
  • 75.8 - 56.8 = 19
  • df: 1
pchisq(19, 1, lower.tail = FALSE)
[1] 1.307185e-05