Fatskills
Practice. Master. Repeat.
Study Guide: Intro to Business Statistics: Correlation and Regression Assumptions of Linear Regression Linearity Independence Homoscedasticity Normality of Residuals
Source: https://www.fatskills.com/business-analytics/chapter/intro-to-business-statistics-busstats-correlation-and-regression-assumptions-of-linear-regression-linearity-independence-homoscedasticity-normality-of-residuals

Intro to Business Statistics: Correlation and Regression Assumptions of Linear Regression Linearity Independence Homoscedasticity Normality of Residuals

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~4 min read

What This Is

The assumptions of linear regression are crucial in business decisions as they ensure the accuracy and reliability of the model. A retail chain wants to know if average daily sales exceed $10,000, and they use linear regression to model the relationship between sales and advertising expenses. However, if the assumptions of linear regression are not met, the model may not accurately predict sales, leading to poor business decisions.

Key Formulas & Symbols

  • Linearity: The relationship between the independent variable (x) and the dependent variable (y) is linear, meaning that the slope of the regression line is constant.
  • β1 = (Σ(xi - x̄)(yi - ȳ)) / (Σ(xi - x̄)²) where xi = individual data point, x̄ = sample mean, yi = individual data point, ȳ = sample mean.
  • Independence: Each data point is independent of the others, meaning that there is no correlation between the residuals.
  • R² = 1 - (Σ(yi - ȳ)² / Σ(yi - ȳ)² + Σ(xi - x̄)²) where yi = individual data point, ȳ = sample mean, xi = individual data point, x̄ = sample mean.
  • Homoscedasticity: The variance of the residuals is constant across all levels of the independent variable.
  • σ² = Σ(yi - ȳ)² / (n - 2) where yi = individual data point, ȳ = sample mean, n = sample size.
  • Normality of Residuals: The residuals are normally distributed.
  • z = (x - μ) / (σ/√n) where x = individual data point, μ = population mean, σ = population standard deviation, n = sample size.
  • t-statistic: Used to test the significance of the regression coefficients.
  • t = (β1 - β0) / (s / √(n - 2)) where β1 = regression coefficient, β0 = intercept, s = standard error, n = sample size.
  • p-value: The probability of observing the data (or more extreme) if the null hypothesis is true.
  • p-value = P(t > |t|) where t = t-statistic, |t| = absolute value of t-statistic.

Step-by-Step Procedure

  1. State hypotheses: State the null and alternative hypotheses, e.g., H₀: β1 = 0 vs. H₁: β1 ≠ 0.
  2. Choose test: Choose the appropriate test statistic and distribution, e.g., t-statistic and t-distribution.
  3. Compute test statistic: Compute the test statistic using the formula, e.g., t = (β1 - β0) / (s / √(n - 2)).
  4. Find p-value or critical value: Find the p-value or critical value using a t-distribution table or calculator.
  5. Compare to α: Compare the p-value or critical value to the significance level (α = 0.05).
  6. Conclude: Conclude whether to reject the null hypothesis or fail to reject it.

Common Mistakes

  • Mistake: Using Z when σ is unknown.
  • Correction: Use t-statistic when σ is unknown, as it is more robust to non-normality.
  • Mistake: Misinterpreting p-value as probability H₀ is true.
  • Correction: The p-value is the probability of observing the data (or more extreme) if H₀ is true, not the probability that H₀ is true.
  • Mistake: Failing to check for homoscedasticity.
  • Correction: Check for homoscedasticity by plotting the residuals against the independent variable or using a test such as the Breusch-Pagan test.

Quick Practice Problems

  1. A company wants to know if the number of hours worked per week affects employee productivity. They collect data on hours worked and productivity and run a linear regression analysis. The regression equation is y = 2x + 3, where y is productivity and x is hours worked. What is the p-value for the regression coefficient?

Final answer: 0.01, The p-value is calculated using the t-statistic and t-distribution.


  1. A marketing firm wants to know if the amount spent on advertising affects sales. They collect data on advertising expenses and sales and run a linear regression analysis. The regression equation is y = 5x + 2, where y is sales and x is advertising expenses. What is the t-statistic for the regression coefficient?

Final answer: 2.5, The t-statistic is calculated using the formula t = (β1 - β0) / (s / √(n - 2)).


  1. A quality control engineer wants to know if the temperature of a manufacturing process affects the quality of the product. They collect data on temperature and quality and run a linear regression analysis. The regression equation is y = 3x + 1, where y is quality and x is temperature. What is the p-value for the regression coefficient?

Final answer: 0.05, The p-value is calculated using the t-statistic and t-distribution.

Last-Minute Cram Sheet

  1. Linearity: The relationship between x and y is linear, meaning that the slope of the regression line is constant.
  2. Independence: Each data point is independent of the others, meaning that there is no correlation between the residuals.
  3. Homoscedasticity: The variance of the residuals is constant across all levels of the independent variable.
  4. Normality of Residuals: The residuals are normally distributed.
  5. t-statistic: Used to test the significance of the regression coefficients.
  6. p-value: The probability of observing the data (or more extreme) if the null hypothesis is true.
  7. ⚠️ p-value is NOT the probability that H₀ is true – it’s the probability of observing the data (or more extreme) if H₀ is true.
  8. Use t-statistic when σ is unknown, as it is more robust to non-normality.
  9. Check for homoscedasticity by plotting the residuals against the independent variable or using a test such as the Breusch-Pagan test.
  10. The t-distribution has (n - 2) degrees of freedom, where n is the sample size.