Fatskills
Practice. Master. Repeat.
Study Guide: AP Statistics (AP Stats): Correlation (r) – Properties, Interpretation, Limitations
Source: https://www.fatskills.com/ap-statistics/chapter/ap-stats-ap-statistics-correlation-r-properties-interpretation-limitations

AP Statistics (AP Stats): Correlation (r) – Properties, Interpretation, Limitations

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~6 min read

AP Statistics – Correlation (r) – Properties, Interpretation, Limitations

AP Statistics Study Guide: Correlation (r) – Properties, Interpretation, Limitations

What This Is

Correlation (r) measures the strength and direction of a linear relationship between two quantitative variables. It’s essential on the AP exam because it’s the foundation for regression, residual analysis, and interpreting relationships in data. For example, a researcher might study whether study hours (x) and exam scores (y) are positively correlated—meaning more study time tends to lead to higher scores. However, correlation alone does not prove causation (e.g., ice cream sales and drowning deaths are correlated, but one doesn’t cause the other).


Key Terms & Formulas

  • Correlation coefficient (r): Measures the strength and direction of a linear relationship between two quantitative variables. Ranges from -1 to 1.
  • r = 1: Perfect positive linear relationship.
  • r = -1: Perfect negative linear relationship.
  • r = 0: No linear relationship.
  • Formula (for reference, not memorization): [ r = \frac{1}{n-1} \sum \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i - \bar{y}}{s_y} \right) ]

    • n = number of data points
    • x?, ? = sample means
    • s?, s? = sample standard deviations
  • Calculator command (TI-84): LinReg(a+bx) L?, L?, Y? (stores regression equation in Y? and calculates r if "DiagnosticsOn" is enabled).

  • Enable diagnostics: 2nd-0 (Catalog)-DiagnosticOn-ENTER.

  • Coefficient of determination (): The proportion of variation in y explained by the linear relationship with x.

  • Example: If r = 0.8, then = 0.64-64% of the variability in y is explained by x.

  • Lurking variable: A hidden variable that influences both x and y, creating a false appearance of causation.

  • Example: Shoe size and reading ability in children are correlated, but age is the lurking variable.

  • Extrapolation: Using a regression line to predict y for x-values outside the range of the data. Dangerous! (AP loves testing this.)

  • Residual: Observed y – Predicted y (y?). A pattern in residuals suggests a nonlinear relationship.

  • Outlier in regression: A point with a large residual or high leverage (far from the mean of x). Can strongly influence r and the regression line.

  • Influential point: An outlier that, if removed, dramatically changes the regression line or r.

  • Correlation vs. causation: Correlation does not imply causation! Just because two variables are correlated doesn’t mean one causes the other.

  • Conditions for correlation (LINER):

  • Linear: The relationship appears linear in a scatterplot.
  • Independent: Observations are independent (check for random sampling/assignment).
  • Normal: No strong skewness or outliers in residuals (check with a histogram or normal probability plot).
  • Equal variance: Residuals have roughly equal spread for all x-values (check residual plot).
  • Random: Data comes from a random sample or experiment.

Step-by-Step / Process Flow

How to Analyze Correlation in an FRQ

  1. Make a scatterplot (on calculator or by hand).
  2. TI-84: 2nd-Y= (Stat Plot)-Plot1-On-Type: Scatter-Xlist: L?, Ylist: L?-Zoom-9 (ZoomStat).
  3. Describe the direction (positive/negative), form (linear/nonlinear), and strength (weak/moderate/strong).

  4. Calculate r and .

  5. TI-84: Stat-CALC-8: LinReg(a+bx)-Enter L?, L?-Calculate.
  6. Interpret r: "There is a [strong/moderate/weak] [positive/negative] linear relationship between x and y."
  7. Interpret : "% of the variation in y is explained by the linear relationship with x."

  8. Check LINER conditions.

  9. Linear: Look at the scatterplot—does it appear roughly linear?
  10. Independent: Was the data collected randomly? (If not, mention it as a limitation.)
  11. Normal: Check a histogram of residuals (TI-84: Stat-EDIT-L? = RESID-2nd-Y=-Plot2-Histogram-ZoomStat).
  12. Equal variance: Check the residual plot (TI-84: Stat-EDIT-L? = RESID-2nd-Y=-Plot1-Scatter-Ylist: L?-ZoomStat). Residuals should be randomly scattered with no pattern.
  13. Random: Was the data collected via random sampling/experiment?

  14. Interpret the slope (if regression is involved).

  15. Example: If the regression equation is ? = 2.5 + 0.8x (where x = study hours, y = exam score), the slope (0.8) means "For each additional hour studied, the exam score is predicted to increase by 0.8 points, on average."

  16. Discuss limitations.

  17. Correlation-causation: Even if r is strong, we cannot conclude x causes y without an experiment.
  18. Lurking variables: Could another variable explain the relationship?
  19. Extrapolation: Avoid predicting y for x-values outside the data range.

Common Mistakes

Mistake Correction
Assuming correlation implies causation. Correlation only shows a relationship, not causation. To claim causation, you need a well-designed experiment (random assignment, control group).
Ignoring LINER conditions. Always check conditions before interpreting r or the regression line. The AP exam will ask you to verify them.
Interpreting r as a percentage. r is not a percentage! is the percentage of variation explained. Example: r = 0.6- = 36% of variation explained.
Extrapolating without warning. Never predict y for x-values outside the data range. Example: If x ranges from 10–50, don’t predict y when x = 100.
Forgetting to mention direction in r. Always specify whether r is positive or negative. Example: "r = -0.75 indicates a strong negative linear relationship."

AP Exam Insights

  • FRQs often ask you to:
  • Calculate r and (using LinReg on the calculator).
  • Interpret r and in context.
  • Check LINER conditions (especially linearity and residual plots).
  • Discuss limitations (e.g., lurking variables, causation vs. correlation).
  • Tricky distinctions:
  • Correlation vs. slope: r measures strength/direction, while the slope (b) measures rate of change.
  • Outliers vs. influential points: An outlier has a large residual; an influential point changes the regression line if removed.
  • Residual plots: If residuals show a pattern (e.g., curved), the relationship is not linear—even if r is strong!
  • Calculator pitfalls:
  • Forgetting to turn on DiagnosticOn-r won’t display.
  • Mixing up r and in interpretations.
  • Not storing residuals in L? for residual plots.

Quick Check Questions

1. Multiple Choice

A study finds that the correlation between daily screen time (hours) and sleep duration (hours) is r = -0.45. Which of the following is the best interpretation of this value? (A) Increasing screen time causes a decrease in sleep duration. (B) There is a moderate negative linear relationship between screen time and sleep duration. (C) 45% of the variation in sleep duration is explained by screen time. (D) For each additional hour of screen time, sleep duration decreases by 0.45 hours.

Answer: (B) Explanation: r measures strength/direction, not causation or slope. = 0.2025 (20.25% explained), so (C) is wrong. (D) describes the slope, not r.


2. FRQ (Interpretation)

A researcher collects data on the number of hours students spend studying for an exam (x) and their exam scores (y). The regression output is shown below:

Predictor Coef SE Coef T P
Constant 55.2 3.1 17.8 0.000
Study Hours 4.8 0.5 9.6 0.000

r = 0.82

(a) Interpret the value of r in context. (b) Interpret the slope of the regression line in context. (c) The researcher claims that studying more causes higher exam scores. Is this claim justified? Why or why not?

Answers: (a) There is a strong positive linear relationship between study hours and exam scores. (b) For each additional hour studied, the exam score is predicted to increase by 4.8 points, on average. (c) No, the claim is not justified. Correlation does not imply causation. There may be lurking variables (e.g., prior knowledge, sleep, IQ) that explain the relationship.


Last-Minute Cram Sheet

  1. Correlation (r) measures strength/direction of a linear relationship (-1-r-1).
  2. Calculator: LinReg(a+bx) L?, L?-r appears if DiagnosticOn is enabled.
  3. Interpret r: "There is a [strong/moderate/weak] [positive/negative] linear relationship between x and y."
  4. Interpret : "% of the variation in y is explained by the linear relationship with x."
  5. LINER conditions: Check Linear, Independent, Normal residuals, Equal variance, Random.
  6. Residual = observed y – predicted y (y?).
  7. Residual plot: Should show no pattern (if linear).
  8. Correlation-causation! Need an experiment to claim causation.
  9. Extrapolation is dangerous! Don’t predict outside the data range.
  10. Outliers can strongly influence r and the regression line.