# A tibble: 1 × 1
mean
<dbl>
1 174.
How can we visualize a single continuous variable?
Histogram
Density
How can we numerically summarize a single continuous variable?
Why do we calculate a mean?
n
to 1
)Symmetric
Bimodal
Guess the mean for each of these variables.
Symmetric
Bimodal
Does this value represent a “typical” observation?
\[\Large\bar{x} =\sum_{i=1}^n \frac{x_i}{n}\]
\[\Large{\require{color}\colorbox{#86a293}{$\bar{x}$}} =\sum_{i=1}^n \frac{x_i}{n}\]
the mean of the variable \(x\)
\[\Large\bar{x} ={\require{color}\colorbox{#86a293}{$\sum$}}_{i=1}^n \frac{x_i}{n}\]
add up the observations
\[\Large\bar{x} =\sum_{{\require{color}\colorbox{#86a293}{$i=1$}}}^n \frac{x_i}{n}\]
from the first
\[\Large\bar{x} =\sum_{i=1}^{\require{color}\colorbox{#86a293}{$n$}} \frac{x_i}{{\require{color}\colorbox{#86a293}{$n$}}}\]
total number of observations
\[\Large\bar{x} =\sum_{i=1}^n \frac{{\require{color}\colorbox{#86a293}{$x_i$}}}{n}\]
continuous variable for observation i
\[\Large\bar{x} =\sum_{i=1}^n \frac{x_i}{\require{color}\colorbox{#86a293}{${n}$}}\]
divide by the total number of observations
Application Exercise
data | |
---|---|
\(x_1\) | 3 |
\(x_2\) | 5 |
\(x_3\) | 1 |
\(x_4\) | 7 |
\(x_5\) | 8 |
03:00
ggplot(d, aes(x = i, y = x)) +
geom_point() +
geom_texthline(yintercept = mean(d$x), label = "mean = 4.8") +
geom_segment(aes(y = x, yend = mean(x), x = i, xend = i), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
\[\Large x = \beta_0 + \varepsilon\]
\[\Large {\require{color}\colorbox{#86a293}{$x$}} = \beta_0 + \varepsilon\]
This is the vector \(x=\{x_1,\dots,x_n\}\)
\[\Large x = {\require{color}\colorbox{#86a293}{$\beta_0$}} + \varepsilon\]
we call this the “intercept”, when there are no other variables, it is just the mean, \(\bar{x}\)
\[\Large x = \beta_0 + {\require{color}\colorbox{#86a293}{$\varepsilon$}}\]
the error
ggplot(d, aes(x = i, y = x)) +
geom_point() +
geom_texthline(yintercept = mean(d$x), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_segment(aes(y = x, yend = mean(x), x = i, xend = i), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
ggplot(d, aes(x = i, y = x)) +
geom_point() +
geom_texthline(yintercept = mean(d$x), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_textsegment(aes(y = x, yend = mean(x), x = i, xend = i), color = "blue",
label = as.character(expression(epsilon)), parse = TRUE,
lwd = 5) +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
ggplot(d, aes(x = 1, y = x)) +
geom_point() +
geom_texthline(yintercept = mean(d$x), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_segment(aes(y = x, yend = mean(x), x = 1, xend = 1), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
ggplot(d, aes(x = 1, y = x)) +
geom_point() +
geom_texthline(yintercept = mean(d$x), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_segment(aes(y = x, yend = mean(x), x = 1, xend = 1), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank())
lm
: linear modelApplication Exercise
Open your 04-appex.qmd
file. Load the packages by running the top R chunk of code.
What do you think this code does? Try typing ?tibble
in the Console - what does this function do?
mean
of x
. Do this two ways, using the summary
function and using the lm
function.error
to the data set d
that is equal to x
minus the mean of x.05:00
When is the mean
an appropriate summary measure to calculate?
What assumptions need to be true in order to use a mean to represent your single continuous variable?