Fatskills
Practice. Master. Repeat.
Study Guide: Intro to Business Statistics: Chi Square Tests - Chi-Square Goodness-of-Fit Test, Comparing Observed vs. Expected Frequencies
Source: https://www.fatskills.com/business-analytics/chapter/intro-to-business-statistics-busstats-chi-square-tests-chisquare-goodnessoffit-test-comparing-observed-vs-expected-frequencies

Intro to Business Statistics: Chi Square Tests - Chi-Square Goodness-of-Fit Test, Comparing Observed vs. Expected Frequencies

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~4 min read

What This Is

The Chi-Square Goodness-of-Fit Test is a statistical method used to compare observed frequencies in a dataset to expected frequencies based on a specific distribution. This test is crucial in business decisions, such as quality control, where manufacturers want to ensure that their products meet certain quality standards. For instance, a food manufacturer wants to know if the proportion of defective products exceeds 5% (the expected frequency) in a batch of 1,000 units.

Key Formulas & Symbols

  • ?² =-[(observed frequency - expected frequency)^2 / expected frequency] where observed frequency = number of occurrences in the sample, expected frequency = number of occurrences expected under the null hypothesis.
  • ?² distribution: a theoretical distribution used to calculate the probability of observing the test statistic (?²) under the null hypothesis.
  • Degrees of Freedom (df): the number of observations minus the number of categories (df = k - 1, where k is the number of categories).
  • Null Hypothesis (H?): the statement that there is no significant difference between observed and expected frequencies.
  • Alternative Hypothesis (H?): the statement that there is a significant difference between observed and expected frequencies.
  • p-value: the probability of observing the test statistic (?²) or more extreme, assuming the null hypothesis is true.
  • Critical Value (CV): the ?² value that separates the rejection region from the non-rejection region.

Step-by-Step Procedure

  1. State Hypotheses: Define the null and alternative hypotheses. For example, H?: p = 0.05 (proportion of defective products is 5%) vs. H?: p-0.05.
  2. Choose Test: Select the Chi-Square Goodness-of-Fit Test as the appropriate statistical method.
  3. Compute Test Statistic: Calculate the ?² value using the observed and expected frequencies.
  4. Find p-value or Critical Value: Determine the p-value or critical value using the ?² distribution and degrees of freedom.
  5. Compare to ?: Compare the p-value or critical value to the significance level (? = 0.05).
  6. Conclude: Reject the null hypothesis if the p-value is less than-or the critical value is exceeded.

Common Mistakes

  • Mistake: Misinterpreting the p-value as the probability that the null hypothesis is true.
  • Correction: The p-value is the probability of observing the data (or more extreme) if the null hypothesis is true. It does not provide information about the probability of the null hypothesis being true.
  • Mistake: Failing to check the assumptions of the Chi-Square Goodness-of-Fit Test (e.g., expected frequencies should be at least 5).
  • Correction: Verify that the expected frequencies meet the assumptions before conducting the test.
  • Mistake: Using the Chi-Square Goodness-of-Fit Test for categorical data with more than 2 categories.
  • Correction: Use the Chi-Square Test of Independence or other appropriate tests for categorical data with more than 2 categories.

Quick Practice Problems

  1. A company wants to know if the proportion of customers who prefer product A exceeds 30% (the expected frequency). The observed frequency is 45 out of 100 customers. What is the ?² value?

?² =-[(observed frequency - expected frequency)^2 / expected frequency] = (45 - 30)^2 / 30 + (55 - 70)^2 / 70 = 15^2 / 30 + (-15)^2 / 70 = 225 / 30 + 225 / 70 = 7.5 + 3.21 = 10.71

  1. A manufacturer wants to know if the proportion of defective products exceeds 5% (the expected frequency) in a batch of 1,000 units. The observed frequency is 60 defective units. What is the p-value?

Using a ?² distribution table with df = 1 - 1 = 0 (not applicable) or a calculator, we find the p-value-0.001.

  1. A company wants to know if the proportion of customers who prefer product B exceeds 20% (the expected frequency). The observed frequency is 25 out of 100 customers. What is the ?² value?

?² =-[(observed frequency - expected frequency)^2 / expected frequency] = (25 - 20)^2 / 20 + (75 - 80)^2 / 80 = 5^2 / 20 + (-5)^2 / 80 = 25 / 20 + 25 / 80 = 1.25 + 0.3125 = 1.5625

Last-Minute Cram Sheet

  • ?² =-[(observed frequency - expected frequency)^2 / expected frequency].
  • ?² distribution: used to calculate the probability of observing the test statistic (?²) under the null hypothesis.
  • df = k - 1, where k is the number of categories.
  • Null Hypothesis (H?): the statement that there is no significant difference between observed and expected frequencies.
  • Alternative Hypothesis (H?): the statement that there is a significant difference between observed and expected frequencies.
  • p-value: the probability of observing the test statistic (?²) or more extreme, assuming the null hypothesis is true.
  • Critical Value (CV): the ?² value that separates the rejection region from the non-rejection region.
  • Assumptions: expected frequencies should be at least 5, and the data should be categorical.
  • p-value is NOT the probability that H? is true – it’s the probability of observing the data (or more extreme) if H? is true.
  • Use the Chi-Square Test of Independence or other appropriate tests for categorical data with more than 2 categories.
  • Verify that the expected frequencies meet the assumptions before conducting the test.