Fatskills
Practice. Master. Repeat.
Study Guide: AP Statistics (AP Stats): Chi?Square Goodness?of?Fit Test (One?Way Table)
Source: https://www.fatskills.com/ap-statistics/chapter/ap-stats-ap-statistics-chisquare-goodnessoffit-test-oneway-table

AP Statistics (AP Stats): Chi?Square Goodness?of?Fit Test (One?Way Table)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~6 min read

AP Statistics – Chi?Square Goodness?of?Fit Test (One?Way Table)

AP Statistics: Chi-Square Goodness-of-Fit Test (One-Way Table) – Exam-Ready Study Guide


What This Is

The Chi-Square Goodness-of-Fit Test determines whether a categorical variable’s observed distribution matches an expected distribution. It’s essential for the AP exam because it tests hypotheses about proportions in a single categorical variable (e.g., "Do M&M colors appear in equal proportions in a bag?" or "Is a die fair?"). Unlike z-tests for proportions, this test compares multiple categories at once using a one-way table.

Real-world example: A factory claims its gummy bears are produced in the following color distribution: 30% red, 20% yellow, 20% green, 15% orange, and 15% purple. A quality control inspector samples 500 gummy bears and records the observed counts. Does the sample provide convincing evidence that the factory’s claimed distribution is incorrect?


Key Terms & Formulas

  • Chi-Square Goodness-of-Fit Test: A hypothesis test for a single categorical variable with multiple categories. Compares observed counts to expected counts under a null hypothesis.
  • H?: The observed distribution matches the expected distribution (specified proportions).
  • H?: The observed distribution does not match the expected distribution (at least one proportion differs).

  • Test Statistic (?²): [ \chi^2 = \sum \frac{(O - E)^2}{E} ]

  • O = Observed count for a category
  • E = Expected count for a category (calculated as ( n \times p_i ), where ( p_i ) is the expected proportion for category ( i ))

  • Degrees of Freedom (df): ( \text{df} = \text{number of categories} - 1 )

  • Large Sample Size Condition: All expected counts must be ? 5 (check this before running the test!).

  • P-value: The probability of observing a ?² statistic as extreme as (or more extreme than) the one calculated, assuming H? is true. Found using ?²cdf(lower, upper, df) on the TI-84.

  • For a goodness-of-fit test, lower = test statistic, upper = 1E99, df = categories - 1.

  • TI-84 Command for ?² Test: STAT-TESTS-D: ?²GOF-Test

  • Enter observed counts in L1, expected proportions in L2, and degrees of freedom.
  • The calculator returns the ?² statistic and p-value.

  • Expected Count (E): ( E = n \times p_i ), where ( n ) = total sample size and ( p_i ) = expected proportion for category ( i ).

  • Interpretation of p-value: If p-value <? (significance level, usually 0.05), reject H?. There is convincing evidence that the distribution differs from the expected.


Step-by-Step / Process Flow

Follow these steps for every goodness-of-fit FRQ:

  1. State Hypotheses
  2. H?: The observed distribution matches the expected distribution (specify proportions).
  3. H?: The observed distribution does not match the expected distribution (at least one proportion differs).
  4. Example: H?: The gummy bear colors follow the distribution 30% red, 20% yellow, 20% green, 15% orange, 15% purple. H?: At least one of these proportions is incorrect.

  5. Check Conditions

  6. Random: The data comes from a random sample or randomized experiment.
  7. Large Sample Size: All expected counts-5.

    • Calculate expected counts: ( E = n \times p_i ).
    • Example: For 500 gummy bears, expected counts are 150 red, 100 yellow, 100 green, 75 orange, 75 purple. All-5, so condition is met.
  8. Compute Test Statistic

  9. Calculate ?² using the formula or TI-84 (?²GOF-Test).
  10. Example: If observed counts are 160 red, 90 yellow, 110 green, 70 orange, 70 purple, ?²-4.76.

  11. Find P-value

  12. Use ?²cdf(test statistic, 1E99, df) or the TI-84 test output.
  13. Example: df = 5 - 1 = 4. P-value-0.313.

  14. Make a Conclusion in Context

  15. Compare p-value to? (usually 0.05).
  16. If p-value < ?: Reject H?. There is convincing evidence that the distribution differs from the expected.
  17. If p-value-?: Fail to reject H?. There is not convincing evidence that the distribution differs.
  18. Example: Since p-value (0.313) > 0.05, we fail to reject H?. There is not convincing evidence that the gummy bear color distribution differs from the factory’s claim.

Common Mistakes

  • Mistake: Forgetting to check the large sample size condition (all expected counts-5).
  • Correction: Always calculate expected counts and verify this condition before running the test. If any expected count is < 5, the test is invalid.

  • Mistake: Using observed proportions instead of expected proportions to calculate ?².

  • Correction: The test compares observed counts to expected counts (not proportions). Expected counts = ( n \times p_i ), where ( p_i ) is the hypothesized proportion (from H?).

  • Mistake: Miscalculating degrees of freedom as ( n - 1 ) instead of ( \text{categories} - 1 ).

  • Correction: df = number of categories - 1. For 5 gummy bear colors, df = 4.

  • Mistake: Interpreting a "fail to reject H?" conclusion as "H? is true."

  • Correction: Failing to reject H? means there is not enough evidence to conclude the distribution differs. It does not prove H? is true.

  • Mistake: Using ?²pdf instead of ?²cdf to find the p-value.

  • Correction: The p-value is the area above the test statistic, so use ?²cdf(test statistic, 1E99, df).

AP Exam Insights

  • FRQ Setup: Expect a one-way table with observed counts and a claim about expected proportions. You’ll need to:
  • State hypotheses in context.
  • Check conditions (especially expected counts-5).
  • Calculate ?² and p-value (usually via TI-84).
  • Write a conclusion in context.

  • Tricky Distinction: The goodness-of-fit test is not for comparing two samples (use a ?² test for homogeneity or two-sample z-test for that). It tests one sample against a hypothesized distribution.

  • Calculator Pitfall: The TI-84’s ?²GOF-Test requires observed counts in L1 and expected proportions in L2 (not expected counts!). The calculator converts proportions to counts automatically.

  • Common-Levels: The exam often uses-= 0.05, but watch for-= 0.01 or 0.10 in the problem statement.


Quick Check Questions

  1. Multiple Choice: A biologist claims that 40% of butterflies in a region are monarchs, 35% are swallowtails, and 25% are painted ladies. A sample of 200 butterflies yields 90 monarchs, 60 swallowtails, and 50 painted ladies. Which of the following is the correct test statistic for a goodness-of-fit test? A) ( \frac{(90 - 80)^2}{80} + \frac{(60 - 70)^2}{70} + \frac{(50 - 50)^2}{50} ) B) ( \frac{(0.45 - 0.40)^2}{0.40} + \frac{(0.30 - 0.35)^2}{0.35} + \frac{(0.25 - 0.25)^2}{0.25} ) C) ( \frac{(90 - 80)^2}{90} + \frac{(60 - 70)^2}{60} + \frac{(50 - 50)^2}{50} ) D) ( \frac{(90 - 80)^2}{200} + \frac{(60 - 70)^2}{200} + \frac{(50 - 50)^2}{200} )

Answer: A Explanation: The test statistic uses observed counts (O) and expected counts (E = n × p), with ( \frac{(O - E)^2}{E} ) for each category.


  1. FRQ Part: A casino claims its roulette wheel is fair, with 18 red slots, 18 black slots, and 2 green slots (total 38 slots). A gambler records 100 spins and observes 40 red, 50 black, and 10 green. Do these data provide convincing evidence that the wheel is unfair? Use-= 0.05.
  2. a) State the hypotheses.
  3. b) Check the conditions for inference.
  4. c) Calculate the test statistic and p-value.
  5. d) State your conclusion in context.

Answer: - a) H?: The roulette wheel is fair (proportions: 18/38 red, 18/38 black, 2/38 green). H?: The roulette wheel is unfair (at least one proportion differs). - b) Random: Assume spins are random. Large Sample Size: Expected counts = 100 × (18/38)-47.37 red, 47.37 black, 5.26 green. All-5, so condition is met. - c) ?² = ( \frac{(40 - 47.37)^2}{47.37} + \frac{(50 - 47.37)^2}{47.37} + \frac{(10 - 5.26)^2}{5.26} )-5.42. df = 3 - 1 = 2. P-value = ?²cdf(5.42, 1E99, 2)-0.066. - d) Since p-value (0.066) > 0.05, we fail to reject H?. There is not convincing evidence that the roulette wheel is unfair.


Last-Minute Cram Sheet

  1. Purpose: Test if observed counts match expected proportions for one categorical variable.
  2. Hypotheses: H?: Observed = Expected (specify proportions). H?: Observed-Expected (at least one differs).
  3. Test Statistic: ( \chi^2 = \sum \frac{(O - E)^2}{E} ).
  4. Expected Counts: ( E = n \times p_i ) (must be-5 for all categories!).
  5. df: Number of categories - 1.
  6. P-value: ?²cdf(test statistic, 1E99, df) or TI-84 ?²GOF-Test.
  7. Conditions: Random sample, all expected counts-5.
  8. TI-84: STAT-TESTS-D: ?²GOF-Test (L1 = observed, L2 = expected proportions).
  9. Conclusion: If p-value < ?, reject H?. Otherwise, fail to reject.
  10. Trap: Expected counts are not the same as expected proportions! Multiply by ( n ).