By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
The Chi-Square Goodness-of-Fit Test determines whether a categorical variable’s observed distribution matches an expected distribution. It’s essential for the AP exam because it tests hypotheses about proportions in a single categorical variable (e.g., "Do M&M colors appear in equal proportions in a bag?" or "Is a die fair?"). Unlike z-tests for proportions, this test compares multiple categories at once using a one-way table.
Real-world example: A factory claims its gummy bears are produced in the following color distribution: 30% red, 20% yellow, 20% green, 15% orange, and 15% purple. A quality control inspector samples 500 gummy bears and records the observed counts. Does the sample provide convincing evidence that the factory’s claimed distribution is incorrect?
H?: The observed distribution does not match the expected distribution (at least one proportion differs).
Test Statistic (?²): [ \chi^2 = \sum \frac{(O - E)^2}{E} ]
E = Expected count for a category (calculated as ( n \times p_i ), where ( p_i ) is the expected proportion for category ( i ))
Degrees of Freedom (df): ( \text{df} = \text{number of categories} - 1 )
Large Sample Size Condition: All expected counts must be ? 5 (check this before running the test!).
P-value: The probability of observing a ?² statistic as extreme as (or more extreme than) the one calculated, assuming H? is true. Found using ?²cdf(lower, upper, df) on the TI-84.
?²cdf(lower, upper, df)
For a goodness-of-fit test, lower = test statistic, upper = 1E99, df = categories - 1.
TI-84 Command for ?² Test: STAT-TESTS-D: ?²GOF-Test
STAT-TESTS-D: ?²GOF-Test
The calculator returns the ?² statistic and p-value.
Expected Count (E): ( E = n \times p_i ), where ( n ) = total sample size and ( p_i ) = expected proportion for category ( i ).
Interpretation of p-value: If p-value <? (significance level, usually 0.05), reject H?. There is convincing evidence that the distribution differs from the expected.
Follow these steps for every goodness-of-fit FRQ:
Example: H?: The gummy bear colors follow the distribution 30% red, 20% yellow, 20% green, 15% orange, 15% purple. H?: At least one of these proportions is incorrect.
Check Conditions
Large Sample Size: All expected counts-5.
Compute Test Statistic
?²GOF-Test
Example: If observed counts are 160 red, 90 yellow, 110 green, 70 orange, 70 purple, ?²-4.76.
Find P-value
?²cdf(test statistic, 1E99, df)
Example: df = 5 - 1 = 4. P-value-0.313.
Make a Conclusion in Context
Correction: Always calculate expected counts and verify this condition before running the test. If any expected count is < 5, the test is invalid.
Mistake: Using observed proportions instead of expected proportions to calculate ?².
Correction: The test compares observed counts to expected counts (not proportions). Expected counts = ( n \times p_i ), where ( p_i ) is the hypothesized proportion (from H?).
Mistake: Miscalculating degrees of freedom as ( n - 1 ) instead of ( \text{categories} - 1 ).
Correction: df = number of categories - 1. For 5 gummy bear colors, df = 4.
Mistake: Interpreting a "fail to reject H?" conclusion as "H? is true."
Correction: Failing to reject H? means there is not enough evidence to conclude the distribution differs. It does not prove H? is true.
Mistake: Using ?²pdf instead of ?²cdf to find the p-value.
?²pdf
?²cdf
Write a conclusion in context.
Tricky Distinction: The goodness-of-fit test is not for comparing two samples (use a ?² test for homogeneity or two-sample z-test for that). It tests one sample against a hypothesized distribution.
Calculator Pitfall: The TI-84’s ?²GOF-Test requires observed counts in L1 and expected proportions in L2 (not expected counts!). The calculator converts proportions to counts automatically.
Common-Levels: The exam often uses-= 0.05, but watch for-= 0.01 or 0.10 in the problem statement.
Answer: A Explanation: The test statistic uses observed counts (O) and expected counts (E = n × p), with ( \frac{(O - E)^2}{E} ) for each category.
Answer: - a) H?: The roulette wheel is fair (proportions: 18/38 red, 18/38 black, 2/38 green). H?: The roulette wheel is unfair (at least one proportion differs). - b) Random: Assume spins are random. Large Sample Size: Expected counts = 100 × (18/38)-47.37 red, 47.37 black, 5.26 green. All-5, so condition is met. - c) ?² = ( \frac{(40 - 47.37)^2}{47.37} + \frac{(50 - 47.37)^2}{47.37} + \frac{(10 - 5.26)^2}{5.26} )-5.42. df = 3 - 1 = 2. P-value = ?²cdf(5.42, 1E99, 2)-0.066. - d) Since p-value (0.066) > 0.05, we fail to reject H?. There is not convincing evidence that the roulette wheel is unfair.
?²cdf(5.42, 1E99, 2)
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.