Fatskills
Practice. Master. Repeat.
Study Guide: College Math: Statistics Regression-Analysis - Linear Regression Least Squares Line and Residuals
Source: https://www.fatskills.com/restaurants/chapter/collegemath-statistics-regression-analysis-linear-regression-least-squares-line-and-residuals

College Math: Statistics Regression-Analysis - Linear Regression Least Squares Line and Residuals

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~9 min read

Linear Regression – Least Squares Line and Residuals

What Is This?

A linear regression model uses a linear equation to predict the value of a dependent variable based on one or more independent variables. The least squares line is a specific type of linear regression model that minimizes the sum of the squared residuals between observed data points and the predicted line.

Why It Matters

Linear regression is a fundamental technique in data analysis, widely used in various fields, including:

  • Economics: To model the relationship between economic variables, such as GDP and inflation rate.
  • Engineering: To predict the behavior of complex systems, like the relationship between speed and fuel consumption.
  • Medicine: To identify risk factors for diseases, such as the relationship between blood pressure and cardiovascular disease.

Core Concepts

1. Least Squares Line

The least squares line is a linear equation of the form:

$$y = \beta_0 + \beta_1x$$

where:

  • $\beta_0$ is the y-intercept
  • $\beta_1$ is the slope
  • $x$ is the independent variable
  • $y$ is the dependent variable

The goal of the least squares line is to minimize the sum of the squared residuals between observed data points and the predicted line.

2. Residuals

Residuals are the differences between observed data points and the predicted line. They are calculated as:

$$e_i = y_i - (\beta_0 + \beta_1x_i)$$

where:

  • $e_i$ is the residual for the $i^{th}$ data point
  • $y_i$ is the observed value for the $i^{th}$ data point
  • $x_i$ is the independent variable for the $i^{th}$ data point

3. Coefficient of Determination (R-Squared)

R-squared measures the proportion of the variance in the dependent variable that is explained by the independent variable(s). It is calculated as:

$$R^2 = 1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2}$$

where:

  • $\bar{y}$ is the mean of the dependent variable

Step-by-Step: How to Approach Problems

To solve a linear regression problem, follow these steps:

  1. Identify the dependent and independent variables: Determine which variable is being predicted (dependent variable) and which variable is being used to make the prediction (independent variable).
  2. Plot the data: Visualize the relationship between the dependent and independent variables to identify any patterns or outliers.
  3. Calculate the least squares line: Use the least squares method to calculate the values of $\beta_0$ and $\beta_1$.
  4. Calculate the residuals: Calculate the residuals between observed data points and the predicted line.
  5. Calculate R-squared: Calculate the coefficient of determination (R-squared) to measure the strength of the relationship between the dependent and independent variables.

Solved Examples

Problem 1

A company wants to predict the price of a new product based on the number of features it has. The data is as follows:

Features Price
2 100
3 120
4 150
5 180
6 200

Find the least squares line and calculate R-squared.

Solution

First, we need to calculate the mean of the independent variable (features) and the dependent variable (price).

$$\bar{x} = \frac{2 + 3 + 4 + 5 + 6}{5} = 4$$

$$\bar{y} = \frac{100 + 120 + 150 + 180 + 200}{5} = 150$$

Next, we need to calculate the deviations from the mean for both variables.

Features Deviation from Mean Price Deviation from Mean
2 -2 100 -50
3 -1 120 -30
4 0 150 0
5 1 180 30
6 2 200 50

Then, we need to calculate the slope ($\beta_1$) and the y-intercept ($\beta_0$) of the least squares line.

$$\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$$

$$\beta_1 = \frac{(-2)(-50) + (-1)(-30) + (0)(0) + (1)(30) + (2)(50)}{(-2)^2 + (-1)^2 + (0)^2 + (1)^2 + (2)^2}$$

$$\beta_1 = \frac{100 + 30 + 0 + 30 + 100}{4 + 1 + 0 + 1 + 4}$$

$$\beta_1 = \frac{260}{10}$$

$$\beta_1 = 26$$

$$\beta_0 = \bar{y} - \beta_1 \bar{x}$$

$$\beta_0 = 150 - 26(4)$$

$$\beta_0 = 150 - 104$$

$$\beta_0 = 46$$

The least squares line is:

$$y = 46 + 26x$$

Finally, we need to calculate the residuals and R-squared.

$$e_i = y_i - (\beta_0 + \beta_1x_i)$$

$$e_i = y_i - (46 + 26x_i)$$

$$e_i = y_i - 46 - 26x_i$$

$$e_i = (y_i - 46) - 26x_i$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i - 46) - 26(x_i - 4)$$

$$e_i = (y_i -