By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Outliers and influential points are data points that can distort the results of a linear regression analysis. An outlier is a point with an unusually large residual (far from the regression line), while an influential point is a point that, if removed, significantly changes the slope, y-intercept, or correlation of the regression line. These concepts are critical on the AP exam because they test your ability to assess the reliability of a regression model—key in real-world scenarios like predicting house prices from square footage, analyzing the effect of study time on test scores, or evaluating the impact of a single extreme value (e.g., a billionaire’s income in a salary dataset).
STAT-CALC-8:LinReg(a+bx)
DIAGNOSTIC ON
STAT-EDIT-L3 = residuals
2nd-LIST-OPS-7:?List
STAT-CALC-8:LinReg(a+bx) L1, L2, Y1
STAT-EDIT-L3 = RESID
2nd-Y= (STAT PLOT)-Plot1-Xlist:L1, Ylist:L3-ZOOM-9:ZoomStat
How to analyze outliers and influential points in an FRQ:
2nd-Y=-Plot1-L1, L2
Sketch the LSRL on the scatterplot.
Identify potential outliers
L3 = RESID
Look for points far from the regression line in the scatterplot.
Check for influential points
Remove the point: Re-run regression without it. If the slope/intercept changes substantially, the point is influential.
Interpret the impact
Does it affect predictions? (Compare ? for key x-values.)
Draw conclusions
Mistake: Assuming all outliers are influential. Correction: Not all outliers have high leverage. A point with a large residual but x near x? may not change the regression line much. Check leverage and Cook’s Distance!
Mistake: Ignoring the residual plot when assessing linearity. Correction: A scatterplot alone can hide nonlinear patterns. Always check the residual plot—a curved pattern means the linear model is inappropriate, even if r is high.
Mistake: Deleting influential points without justification. Correction: Only remove points if they’re data errors (e.g., typos) or not representative of the population. Never remove points just to improve r or r².
Mistake: Confusing r with r². Correction: r measures strength/direction of the linear relationship; r² measures proportion of variance explained. An influential point can change r from 0.8 to 0.3 (big impact) but r² from 0.64 to 0.09 (even bigger impact).
Mistake: Forgetting to turn on DIAGNOSTIC for r and r² on the TI-84. Correction: Always run 2nd-0 (CATALOG)-DiagnosticOn-ENTER before regression. Otherwise, r and r² won’t display!
DIAGNOSTIC
2nd-0 (CATALOG)-DiagnosticOn-ENTER
A point can be one, both, or neither!
Common FRQ Setup:
Asked to:
Calculator Pitfall: Students forget to store the regression equation in Y1, making it hard to calculate residuals or make predictions. Always use LinReg(a+bx) L1, L2, Y1!
LinReg(a+bx) L1, L2, Y1
Context Matters: The AP exam expects contextual explanations. For example:
A regression analysis of y = house price (in $1000s) vs. x = square footage yields the following: - LSRL: ? = 50 + 0.1x - r = 0.85 - Residual for a 2,000 sq. ft. house: ?$150,000
Which of the following is true? (A) The house is an outlier but not influential. (B) The house is influential but not an outlier. (C) The house is both an outlier and influential. (D) The house is neither an outlier nor influential.
Answer: (A) The house is an outlier (large residual: ?$150,000) but not necessarily influential (we’d need to check leverage/Cook’s Distance).
A researcher fits a LSRL to predict y = crop yield (kg) from x = fertilizer amount (g). The regression output is below:
a. Identify the potential outlier. Explain why it might be an outlier. b. Without calculating, explain whether this point is likely influential. Justify your answer.
Answer: a. The point (100, 20) is a potential outlier because its y-value (20 kg) is much lower than predicted (?-50 + 1×100 = 150 kg, so residual-?130 kg). b. It is likely influential because its x-value (100 g) is far from the mean x (?42 g), giving it high leverage. Removing it would likely increase the slope and r.
DiagnosticOn
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.