Regression and Correlation

Lucy D’Agostino McGowan

Application Exercise

  1. Copy the following template into RStudio Pro:
https://github.com/sta-112-f23/appex-10.git
  1. Load the packages and then examine the PorschePrice data frame
  2. Fit a linear model predicting a Porsche’s price from the mileage
  3. Examine the ANOVA table – what is the F statistic? What is the associated p-value? What hypothesis is it testing?
04:00

Partitioning variability

Why?

  • \(y − \bar{y} = (\hat{y} − \bar{y}) + (y − \hat{y})\)
  • \(\sum(y − \bar{y})^2 = \sum(\hat{y} − \bar{y})^2 + \sum(y − \hat{y})^2\)
  • SSTotal = SSModel + SSE

coefficient of determination

Often referred to as \(\color{#86a293}{r^2}\), it is the fraction of the response variability that is explained by the model.

Coefficient of determination

  • \(r^2 = \frac{\textrm{Variability explained by the model}}{\textrm{Total variability in } y}\)
  • \(r^2 = \frac{\textrm{SSModel}}{\textrm{SSTotal}}\)
  • \(r^2 = \frac{\sum(\hat{y} - \bar{y})^2}{\sum(y-\bar{y})^2}\)

Application Exercise

\[r^2 = \frac{\textrm{SSModel}}{\textrm{SSTotal}}\]

How could you calculate \(r^2\) if all you had was \(\textrm{SSTotal}\) and \(\textrm{SSE}\)?

01:00

Coefficient of determination

  • \(r^2 = \frac{\textrm{Variability explained by the model}}{\textrm{Total variability in } y}\)
  • \(r^2 = \frac{\textrm{SSModel}}{\textrm{SSTotal}}\)
  • \(r^2 = \frac{\sum(\hat{y} - \bar{y})^2}{\sum(y-\bar{y})^2}\)
  • \(r^2 = \frac{\textrm{SSTotal − SSE}}{\textrm{SSTotal}}\)
  • \(r^2 = 1 - \frac{\textrm{SSE}}{\textrm{SSTotal}}\)

Let’s do it in R!

mod <- lm(battery_percent ~ screen_time, data = data)
summary(mod)

Call:
lm(formula = battery_percent ~ screen_time, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-61.818 -17.353   2.546  19.108 115.720 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 68.150787   2.503928  27.218  < 2e-16 ***
screen_time -0.022447   0.008347  -2.689  0.00735 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.53 on 630 degrees of freedom
Multiple R-squared:  0.01135,   Adjusted R-squared:  0.009781 
F-statistic: 7.233 on 1 and 630 DF,  p-value: 0.007349

1.1% of the variation in the battery percent is explained by screen time.

Application Exercise

  1. Open appex-10.qmd
  2. Run summary on your model predicting Porsche price from mileage
  3. What is the \(r^2\)? How can you interpret this?
03:00