https://github.com/sta-112-f23/lab-06-logistic-regression.git
Lab 06 - Logistic
Due: 2023-11-21 at 11:59pm Turn your .html file in on Canvas
Getting started
Go to RStudio Pro and click:
Step 1. File > New Project
Step 2. “Version Control”
Step 3. Git
Step 4. Copy the following into the “Repository URL”:
Set up
We will use two packages, tidyverse
and Stat2Data
. Additionally, if you need to create an empirical logit plot, below is example code to do so:
library(tidyverse)
library(Stat2Data)
data(MedGPA)
# you can change ngroups as you see necessary, this is kind of like setting the number of bins in a histogram
emplogitplot1(Acceptance ~ GPA, data = MedGPA, ngroups = 5)
Exercises
We are analyzing the Leukemia
data from the Stat2Data
package. Be sure to learn about the data and variables. We are doing inference on a model predicting whether the patient responded to treatment from Age.
Describe the data (number of observations, what the observations are, number of variables, what the variables are, and whether there is any missing data).
Run the following code to drop
Time
(the survival time) andStatus
(an indicator for whether the patient survived), from your dataset.
<- Leukemia |>
Leukemia select(-Time, -Status)
We are interested in whether age at diagnosis influences the probability of responding to treatment. I propose that the differential percentage of blasts, the percentage of absolute marrow leukemia infiltrate and the highest temperature of the patient prior to treatment in degrees Fahrenheit are important to control for when assessing the relationship between age at diagnosis and the probability of responding to treatment. Write out the equation for a model we would fit to assess this relationship.
Fit the model proposed in Exercise 3. Check that this model fits the assumption(s) for logistic regression. Are the assumptions met? Be sure to include a figure to demonstrate the fit of the assumptions for the
Age
variable.Report the coefficient for Age as well as the associated odds ratio with a 95% confidence interval. Interpret this in the context of this data.