Fatskills
Practice. Master. Repeat.
Study Guide: Intro to Business Statistics: Chi Square Tests - Chi-Square Test for Homogeneity of Proportions
Source: https://www.fatskills.com/business-analytics/chapter/intro-to-business-statistics-busstats-chi-square-tests-chisquare-test-for-homogeneity-of-proportions

Intro to Business Statistics: Chi Square Tests - Chi-Square Test for Homogeneity of Proportions

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~4 min read

What This Is

The Chi-Square Test for Homogeneity of Proportions is a statistical method used to determine if there are significant differences in proportions across multiple categories. A retail chain wants to know if the proportion of customers who prefer online shopping is the same across different age groups (18-24, 25-34, 35-44, and 45-54). They collect data on the number of customers in each age group who prefer online shopping and want to use the Chi-Square Test to determine if the proportions are equal.

Key Formulas & Symbols

  • ?² =-[(observed frequency - expected frequency)^2 / expected frequency] where observed frequency = number of customers in each age group who prefer online shopping, expected frequency = total number of customers in each age group multiplied by the overall proportion of customers who prefer online shopping.
  • df = (number of categories - 1) where number of categories = number of age groups.
  • p-value = P(?²-?²_test) where ?²_test = calculated Chi-Square statistic.
  • ? = 0.05 (default significance level).
  • ?²_critical = ?² table value for df and ?.
  • p-value = P(?²-?²_critical).
  • H?: p? = p? = ... = p_k (null hypothesis: proportions are equal).
  • H_a: not all p_i are equal (alternative hypothesis: proportions are not equal).
  • p_i = observed proportion in category i.
  • n_i = total number of observations in category i.

Step-by-Step Procedure

  1. State hypotheses: State the null and alternative hypotheses (H? and H_a).
  2. Choose test: Choose the Chi-Square Test for Homogeneity of Proportions.
  3. Compute test statistic: Calculate the Chi-Square statistic (?²) using the observed frequencies and expected frequencies.
  4. Find p-value or critical value: Find the p-value associated with the calculated Chi-Square statistic or the critical value from the Chi-Square table for the given degrees of freedom and significance level.
  5. Compare to ?: Compare the p-value to the significance level (?) or the calculated Chi-Square statistic to the critical value.
  6. Conclude: If the p-value is less than-or the calculated Chi-Square statistic is greater than the critical value, reject the null hypothesis (H?) and conclude that the proportions are not equal.

Common Mistakes

  • Mistake: Misinterpreting the p-value as the probability that the null hypothesis (H?) is true.
  • Correction: The p-value is the probability of observing the data (or more extreme) if the null hypothesis (H?) is true. It does not provide information about the probability of the null hypothesis being true.
  • Mistake: Failing to check the assumptions of the Chi-Square Test (independence and expected frequencies greater than 5).
  • Correction: The Chi-Square Test assumes that the observations are independent and that the expected frequencies in each category are greater than 5. If these assumptions are not met, the test may not be valid.
  • Mistake: Using the Chi-Square Test for small sample sizes (n < 20).
  • Correction: The Chi-Square Test is not suitable for small sample sizes. In such cases, other tests like the Fisher Exact Test may be more appropriate.

Quick Practice Problems

  1. A marketing firm wants to know if the proportion of customers who prefer a new product is the same across different age groups (18-24, 25-34, 35-44, and 45-54). They collect data on the number of customers in each age group who prefer the new product. The observed frequencies are: 15, 20, 25, and 30. The expected frequencies are: 12, 16, 20, and 24. What is the Chi-Square statistic?

?² = 4.44 (calculated using the observed and expected frequencies).

  1. A quality control team wants to know if the proportion of defective products is the same across different production lines (A, B, and C). They collect data on the number of defective products in each production line. The observed frequencies are: 10, 15, and 20. The expected frequencies are: 12, 16, and 20. What is the p-value?

p-value = 0.018 (calculated using the Chi-Square statistic and the Chi-Square distribution).

  1. A retail chain wants to know if the proportion of customers who prefer online shopping is the same across different age groups (18-24, 25-34, 35-44, and 45-54). They collect data on the number of customers in each age group who prefer online shopping. The observed frequencies are: 20, 25, 30, and 35. The expected frequencies are: 16, 20, 24, and 28. What is the decision?

Reject the null hypothesis (H?) since the p-value (0.012) is less than the significance level (? = 0.05).

Last-Minute Cram Sheet

  1. ?² =-[(observed frequency - expected frequency)^2 / expected frequency].
  2. df = (number of categories - 1).
  3. p-value = P(?²-?²_test).
  4. ? = 0.05 (default significance level).
  5. ?²_critical = ?² table value for df and ?.
  6. p-value = P(?²-?²_critical).
  7. H?: p? = p? = ... = p_k (null hypothesis: proportions are equal).
  8. H_a: not all p_i are equal (alternative hypothesis: proportions are not equal).
  9. p_i = observed proportion in category i.
  10. n_i = total number of observations in category i.
  11. p-value is NOT the probability that H? is true – it’s the probability of observing the data (or more extreme) if H? is true.
  12. The Chi-Square Test assumes independence and expected frequencies greater than 5.
  13. The Chi-Square Test is not suitable for small sample sizes (n < 20).