What if I want to know the average daily screen time for Wake Forest Students?
How can we quantify how much we’d expect the mean to differ from one random sample to another?
We need a measure of uncertainty
How about the standard error of the mean?
The standard error is how much we expect the sample mean to vary from one random sample to another.
Survey data
How can we quantify how much we’d expect the mean to differ from one random sample to another?
mod <-lm(screen_time ~1, data = data_sample)summary(mod)
Call:
lm(formula = screen_time ~ 1, data = data_sample)
Residuals:
Min 1Q Median 3Q Max
-59.091 -42.091 -2.091 38.909 63.909
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 235.09 7.08 33.21 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 40.67 on 32 degrees of freedom
Application Exercise
Fit an intercept only model to calculate the average screen time in your sample
Use the summary function on the linear model you fit
What is the standard error for the mean? Interpret this value.
05:00
confidence intervals
If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter (the average screen time for Wake Forest Students) to fall within the interval estimates 95% of the time.
Confidence interval
\[\bar{x} \pm t^∗ \times SE_{\bar{x}}\]
\(t^*\) is the critical value for the \(t_{n−1}\) density curve to obtain the desired confidence level
Often we want a 95% confidence level.
Demo
Let’s do it in R!
mod <-lm(screen_time ~1, data = data_sample)summary(mod)
Call:
lm(formula = screen_time ~ 1, data = data_sample)
Residuals:
Min 1Q Median 3Q Max
-59.091 -42.091 -2.091 38.909 63.909
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 235.09 7.08 33.21 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 40.67 on 32 degrees of freedom
If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter (the mean) to fall within the interval estimates 95% of the time.
Application Exercise
Open appex-07.qmd
Calculate the \(t^*\) value for your confidence interval
Calculate the confidence interval “by hand” using the \(t^*\) value from exercise 2 and the mean and standard error from the previous application exercise
Calculate the confidence interval using the confint function
Interpret this value
05:00
Confidence Intervals
Code
data |>group_by(id) |>mutate(screen_time =ifelse(screen_time ==1210, 132, screen_time)) |>summarise(m =mean(screen_time, na.rm =TRUE),sd =sd(screen_time, na.rm =TRUE),n =n(),t =qt(0.025, df = n -1, lower.tail =FALSE),lb = m - t * sd,ub = m + t * sd,yes =ifelse(lb <mean(data$screen_time) & ub >mean(data$screen_time), 1, 0)) |>filter(n >2) |>ggplot(aes(y =factor(id), xmin = lb, x = m, xmax = ub, color = yes)) +geom_pointrange() +geom_vline(xintercept =mean(data$screen_time), lty =2) +theme(legend.position ="none") +ylab("id") +xlab("Average Daily Screen Time (Minutes)")