Fatskills
Practice. Master. Repeat.
Study Guide: Intro to Business Statistics: Correlation and Regression - Evaluating Regression, Fit Standard Error of Estimate Coefficient of Determination R² Adjusted R²
Source: https://www.fatskills.com/business-analytics/chapter/intro-to-business-statistics-busstats-correlation-and-regression-evaluating-regression-fit-standard-error-of-estimate-coefficient-of-determination-r%C2%B2-adjusted-r%C2%B2

Intro to Business Statistics: Correlation and Regression - Evaluating Regression, Fit Standard Error of Estimate Coefficient of Determination R² Adjusted R²

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~4 min read

What This Is

Evaluating regression fit is crucial in business decisions as it helps determine the strength and reliability of the relationship between variables. A retail chain wants to know if average daily sales exceed $10,000, and they use regression analysis to understand the relationship between sales and advertising expenses. By evaluating the regression fit, they can determine if the model is a good fit for the data and make informed decisions about future advertising strategies.

Key Formulas & Symbols

  • Standard Error of Estimate (s): measures the variability of the data points around the regression line, where s = ?[(?(yi - ?i)²) / (n - 2)].
  • Coefficient of Determination (r²): measures the proportion of the variance in the dependent variable that is predictable from the independent variable, where r² = 1 - (?(yi - ?i)² / ?(yi - y?)²).
  • Adjusted r² (r²_adj): adjusts for the number of predictors in the model, where r²_adj = 1 - [(n - 1) / (n - k - 1)] * (1 - r²).
  • F-statistic: used to test the overall significance of the regression model, where F = [MSR / MSE], where MSR = (?(?i - y?)²) / (k - 1) and MSE = (?(yi - ?i)²) / (n - k - 1).
  • p-value: the probability of observing the data (or more extreme) if the null hypothesis is true.
  • Critical F-value: the F-value that corresponds to a given significance level (?) and degrees of freedom.
  • R² change: measures the change in R² when a new predictor is added to the model.
  • F-statistic for R² change: used to test the significance of the change in R², where F = [(R²_new - R²_old) / (1 - R²_new)] / [(1 - R²_old) / (n - k - 2)].

Step-by-Step Procedure

  1. State hypotheses: H?: = 0 (no relationship between variables) vs. H?: -0 (relationship exists).
  2. Choose test: F-test for overall significance of the regression model.
  3. Compute test statistic: F = [MSR / MSE], where MSR = (?(?i - y?)²) / (k - 1) and MSE = (?(yi - ?i)²) / (n - k - 1).
  4. Find p-value or critical value: using the F-distribution table or calculator, find the p-value or critical F-value corresponding to the calculated F-statistic and degrees of freedom (n - k - 1, k - 1).
  5. Compare to ?: if p-value <-or F-statistic > critical F-value, reject H?.
  6. Conclude: if H? is rejected, conclude that the regression model is significant and the relationship between variables is statistically significant.

Common Mistakes

  • Mistake: Misinterpreting the p-value as the probability that the null hypothesis is true.
  • Correction: The p-value is the probability of observing the data (or more extreme) if the null hypothesis is true. It does not provide information about the probability of the null hypothesis being true.
  • Mistake: Failing to check for multicollinearity among predictors.
  • Correction: Multicollinearity can lead to unstable estimates of regression coefficients and inflated standard errors. Check for multicollinearity by examining the correlation matrix and variance inflation factor (VIF) values.
  • Mistake: Ignoring the assumption of normality of residuals.
  • Correction: Normality of residuals is assumed in linear regression. Check for normality using plots and statistical tests (e.g., Shapiro-Wilk test).

Quick Practice Problems

  1. A marketing firm wants to know if the number of social media followers is related to website traffic. They run a regression analysis and get an R² of 0.75. What is the adjusted R²? Answer: 0.73 (using the formula r²_adj = 1 - [(n - 1) / (n - k - 1)] * (1 - r²)).
  2. A company wants to know if the price of a product is related to sales. They run a regression analysis and get an F-statistic of 12.5 with 5 and 20 degrees of freedom. What is the p-value? Answer: 0.0003 (using the F-distribution table or calculator).
  3. A retail chain wants to know if the number of employees is related to sales. They run a regression analysis and get an R² of 0.60. If they add a new predictor (number of customers), the R² increases to 0.80. What is the F-statistic for the R² change? Answer: 10.2 (using the formula F = [(R²_new - R²_old) / (1 - R²_new)] / [(1 - R²_old) / (n - k - 2)]).

Last-Minute Cram Sheet

  • p-value is NOT the probability that H? is true – it’s the probability of observing the data (or more extreme) if H? is true.
  • F-statistic = MSR / MSE, where MSR = (?(?i - y?)²) / (k - 1) and MSE = (?(yi - ?i)²) / (n - k - 1).
  • R² = 1 - (?(yi - ?i)² / ?(yi - y?)²).
  • Adjusted R² = 1 - [(n - 1) / (n - k - 1)] * (1 - R²).
  • F-distribution table or calculator is used to find p-value or critical F-value.
  • Degrees of freedom for F-test are (n - k - 1, k - 1).
  • MSR and MSE are used to calculate F-statistic.
  • R² change is used to test the significance of the change in R².
  • F-statistic for R² change is used to test the significance of the change in R².
  • multicollinearity can lead to unstable estimates of regression coefficients and inflated standard errors.
  • Normality of residuals is assumed in linear regression.