https://github.com/sta-112-f23/lab-02-simple-linear-regression.git
Lab 02 - Simple Linear Regression
Due: 2023-09-19 at 11:59pm Turn your .html file in on Canvas
Introduction
This is a BB-8 droid built using the ggplot2 R package by Victor Perrier.
The main goal of this lab is to perform a descriptive analysis using simple linear regression.
Getting started
Go to RStudio Pro and click:
Step 1. File > New Project
Step 2. “Version Control”
Step 3. Git
Step 4. Copy the following into the “Repository URL”:
Warm up
Before we introduce the data, let’s warm up with some simple exercises.
The top portion of your Quarto file (between the three dashed lines) is called YAML. It stands for “YAML Ain’t Markup Language”. It is a human friendly data serialization standard for all programming languages. All you need to know is that this area is called the YAML (we will refer to it as such) and that it contains meta information about your document.
YAML:
Open the Quarto (qmd
) file in your project, change the author name to your name, and render the document.
Change the date in your YAML to today’s date, and render the document.
Packages
In this lab we will use the tidyverse
package. We can load it using the following:
library(tidyverse)
Data
The data frame we will be working with today is called starwars
and it’s in the tidyverse
package.
To find out more about the dataset, type the following in your Console: ?starwars
. A question mark before the name of an object will always bring up its help file. This command must be run in the Console.
Based on the help file, how many rows and how many columns does the
starwars
data set have? What are the variables included in the data frame? Add your responses to your lab report.We are interested in the relationship between the weight of a Star Wars character and their height. Create a visual summary using the
starwars
data of the relationship between these variables. What do you notice?Fit a linear model on the
starwars
data predicting a character’s weight from their height. What is the intercept? Interpret this value. What is the slope? Interpret this value.Using the values in Exercise 3, write out the equation for the predicted weight \((\hat{y})\).
\[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1x\]
In your qmd
file, you can include math using LaTeX
equations. These math equations are denoted using the $
. To include an equation that will be centered on a line, you can wrap it in two $$
. For example, you can add the equation above to your qmd
file by coping the following text:
$$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1$$
You can also click Insert > LaTeX Math > Display Math
.
Using the format above, replace \hat{\beta}_0
and \hat{\beta}_1
with the values you found in Exercise 3.
Using the equation from Exercise 4, if you knew a character was 100 centimeters tall, what would you guess their weight was?
Create a new data set called
starwars_nojabba
where you drop “Jabba Desilijic Tiure” from the data. You can edit the code below to do this.
<- starwars |>
starwars_nojabba filter(name == "----")
How many rows does this new data set have?
Recreate the plot from Exercise 2 on
starwars_nojabba
. What do you notice? How do these plots compare?Refit the linear model from Exercise 3 on this reduced data set. How do the coefficients \((\hat\beta_0, \hat\beta_1)\) compare? Which is a better representation of the average character?
Using the values in Exercise 8, write out the equation for the predicted weight \((\hat{y})\).
Using the equation from Exercise 9, if you knew a character was 100 centimeters tall, what would you guess their weight was? How does this compare to your guess from Exercise 5?
Which data set was better suited for using a simple linear model to summarize the relationship between a character’s weight and height? Why?