Fatskills
Practice. Master. Repeat.
Study Guide: AP Statistics (AP Stats): Coefficient of Determination (r²) – Interpretation
Source: https://www.fatskills.com/ap-statistics/chapter/ap-stats-ap-statistics-coefficient-of-determination-r%C2%B2-interpretation

AP Statistics (AP Stats): Coefficient of Determination (r²) – Interpretation

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~6 min read

AP Statistics – Coefficient of Determination (r²) – Interpretation

AP Statistics Study Guide: Coefficient of Determination (r²) – Interpretation

What This Is

The coefficient of determination (r²) measures the proportion of variability in the response variable (y) that can be explained by the linear relationship with the explanatory variable (x). It’s a key tool for assessing how well a least-squares regression line (LSRL) fits the data. On the AP exam, you’ll need to interpret r² in context, compare models, and explain its meaning in real-world scenarios (e.g., predicting house prices from square footage, explaining test scores based on study hours, or modeling crop yield from rainfall).


Key Terms & Formulas

  • Coefficient of determination (r²): ( r^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} = 1 - \frac{\text{SSE}}{\text{SST}} )
  • SSE (Sum of Squared Errors): Sum of squared residuals (( \sum (y - \hat{y})^2 )).
  • SST (Total Sum of Squares): Sum of squared deviations from the mean (( \sum (y - \bar{y})^2 )).

  • Interpretation of r²: "r²% of the variability in [response variable] is explained by the linear relationship with [explanatory variable]."

  • Correlation coefficient (r): Measures strength and direction of a linear relationship. ( r^2 = (\text{correlation})^2 ).

  • Residual (e): ( e = y - \hat{y} ) (observed – predicted). Used to assess model fit.

  • LSRL (Least-Squares Regression Line): ( \hat{y} = a + bx ), where ( b = r \cdot \frac{s_y}{s_x} ) and ( a = \bar{y} - b\bar{x} ).

  • Calculator command (TI-84):

  • LinReg(a+bx): STAT-CALC-8:LinReg(a+bx) (stores equation in Y1 if Store RegEQ: Y1 is set).
  • r² value: Automatically displayed in the output (or VARS-Statistics-EQ-r²).

  • Conditions for regression inference (LINER):

  • Linear: Scatterplot looks roughly linear.
  • Independent: Observations are independent (check 10% condition if sampling without replacement).
  • Normal: Residuals are approximately normal (check histogram or Normal probability plot).
  • Equal variance: Residuals have roughly equal spread (no fanning).
  • Random: Data comes from a random sample or experiment.

  • Hypothesis test for slope (?):

  • H?: ( \beta = 0 ) (no linear relationship).
  • H?: ( \beta \neq 0 ) (linear relationship exists).
  • Test statistic: ( t = \frac{b - 0}{SE_b} ), where ( SE_b = \frac{s}{\sqrt{\sum (x - \bar{x})^2}} ) and ( s = \sqrt{\frac{SSE}{n-2}} ).

Step-by-Step / Process Flow

How to interpret r² in an FRQ:
1. Identify variables: - Explanatory (x) and response (y) variables in context. - Example: x = hours studied, y = test score.

  1. Compute r² (if not given):
  2. Use LinReg(a+bx) on the TI-84 to find r².
  3. Example output: r² = 0.72.

  4. Interpret r² in context:

  5. "72% of the variability in test scores is explained by the linear relationship with hours studied."
  6. Avoid: "72% of the data fits the model" (incorrect).

  7. Compare models (if asked):

  8. Higher r² = better fit (but check residual plots for appropriateness).
  9. Example: If r² = 0.85 for Model A and r² = 0.60 for Model B, Model A explains more variability.

  10. Check conditions (if inference is required):

  11. Verify LINER conditions (especially linearity and equal variance).

  12. Conclude in context:

  13. Example: "Since r² is high (0.72), hours studied is a strong predictor of test scores, but other factors may explain the remaining 28% of variability."

Common Mistakes

  • Mistake: Saying r² measures the strength of the relationship (like r). Correction: r² measures the proportion of variability explained by the model. Use r for strength/direction.

  • Mistake: Interpreting r² as a percentage of data points that fit the model. Correction: r² is about variability, not individual points. Say: "X% of the variability in y is explained by x."

  • Mistake: Ignoring units or context in interpretation. Correction: Always include the response/explanatory variables.-"r² = 0.64"-"64% of the variability in house prices is explained by square footage."

  • Mistake: Assuming a high r² means causation. Correction: r² only measures association. Correlation-causation (e.g., ice cream sales and drowning deaths are correlated but not causal).

  • Mistake: Forgetting to check LINER conditions before making inferences. Correction: Always verify conditions if the question involves hypothesis tests or confidence intervals for slope.


AP Exam Insights

  • Frequently tested:
  • Interpreting r² in context (FRQs often ask for this explicitly).
  • Comparing r² values between models (e.g., "Which model is better?").
  • Connecting r² to residual plots (e.g., "Does a high r² guarantee a good fit?").

  • Tricky distinctions:

  • r vs. r²: r measures strength/direction; r² measures explained variability.
  • r² vs. slope: A high r² doesn’t mean a steep slope (e.g., r² = 0.99 could have a small slope if data is tightly clustered).
  • Residuals vs. r²: Even with a high r², residuals might show patterns (e.g., curvature), indicating a poor linear fit.

  • Calculator pitfalls:

  • Forgetting to clear Y= before running LinReg (old equations can interfere).
  • Not storing the regression equation in Y1 (needed for residual plots).
  • Misinterpreting the output: r² is labeled as "r²" in the TI-84 output, not "r."

  • Common FRQ setups:

  • Given a scatterplot and r², ask for interpretation.
  • Given two models (e.g., linear vs. quadratic), ask which has a higher r² and why.
  • Ask to explain why r² might be low despite a strong correlation (e.g., nonlinear relationship).

Quick Check Questions

  1. Multiple Choice: A study finds that r² = 0.45 for the relationship between daily screen time (hours) and sleep duration (hours). Which interpretation is correct? A) 45% of people sleep less because of screen time. B) 45% of the variability in sleep duration is explained by screen time. C) Screen time causes 45% of sleep problems. D) The correlation between screen time and sleep duration is 0.45.

Answer: B. r² measures the proportion of variability in the response variable explained by the explanatory variable.

  1. FRQ Part: A regression analysis of house prices (in $1000s) vs. square footage yields r² = 0.81. The mean house price is $300,000. a) Interpret r² in context. b) Does this mean 81% of houses are priced correctly by the model? Explain.

Answer: a) "81% of the variability in house prices is explained by the linear relationship with square footage." b) No. r² measures explained variability, not the accuracy of individual predictions. Some houses may still be over/underpriced.

  1. Multiple Choice: Which residual plot suggests that r² might be misleadingly high? A) Residuals randomly scattered around 0. B) Residuals forming a U-shaped pattern. C) Residuals with equal spread across x-values. D) Residuals normally distributed.

Answer: B. A U-shaped pattern indicates a nonlinear relationship, so r² may overstate the linear fit.


Last-Minute Cram Sheet

  1. r² formula: ( r^2 = 1 - \frac{SSE}{SST} ) (proportion of variability explained).
  2. Interpretation template: "[r²%] of the variability in [y] is explained by the linear relationship with [x]."
  3. r² vs. r: r² is always between 0 and 1; r is between -1 and 1.
  4. High r²-good fit: Check residual plots for patterns (e.g., curvature).
  5. Calculator command: STAT-CALC-8:LinReg(a+bx) (stores r² automatically).
  6. LINER conditions: Check before regression inference (Linear, Independent, Normal, Equal variance, Random).
  7. r²-causation: Association-causation (e.g., shark attacks and ice cream sales).
  8. Units matter: Always include variables in interpretation (e.g., "house prices" vs. "square footage").
  9. Residuals: ( e = y - \hat{y} ); used to assess model fit.
  10. Slope test: H?: ( \beta = 0 ), H?: ( \beta \neq 0 ); use LinRegTTest on TI-84.