Fatskills
Practice. Master. Repeat.
Study Guide: Introductory Statistics: Inference Hypothesis Tests - Chi-Square Tests Goodness-of-Fit and Test of Independence Expected Counts
Source: https://www.fatskills.com/statistics-101/chapter/introductorystatistics-introductory-statistics-inference-hypothesis-tests-chi-square-tests-goodness-of-fit-and-test-of-independence-expected-counts

Introductory Statistics: Inference Hypothesis Tests - Chi-Square Tests Goodness-of-Fit and Test of Independence Expected Counts

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~7 min read

What Is This?

Chi-Square Tests are statistical methods used to determine if there is a significant association between categorical variables. The Goodness-of-Fit Test checks if a sample matches a population, while the Test of Independence checks if two categorical variables are independent. This topic appears in exams to test your ability to apply statistical methods to real-world data and interpret the results.

Why It Matters

Chi-Square Tests are commonly tested in statistics, psychology, sociology, and business exams. They appear frequently and can carry significant marks (10-20% of the total). These tests evaluate your ability to analyze categorical data, understand distributions, and make data-driven decisions.

Core Concepts

  1. Chi-Square Statistic: Measures the difference between observed and expected frequencies.
  2. Degrees of Freedom: Determines the number of values that can vary in the calculation.
  3. Expected Counts: The frequencies you would expect if the null hypothesis were true.
  4. p-value: Indicates the probability of observing the test results under the null hypothesis.
  5. Null Hypothesis: The assumption that there is no difference or association.

Prerequisites

  1. Basic Probability: Understanding of probability distributions and expected values.
  2. Hypothesis Testing: Knowledge of null and alternative hypotheses, p-values, and significance levels.
  3. Descriptive Statistics: Familiarity with frequency distributions and cross-tabulations.

The Rule-Book (How It Works)

Chi-Square Goodness-of-Fit Test

  • Primary Rule: Compare observed frequencies to expected frequencies under the null hypothesis.
  • Formula: [ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} ] where (O_i) is the observed frequency and (E_i) is the expected frequency.
  • Degrees of Freedom: (k - 1) (where (k) is the number of categories).
  • Mnemonic: "O minus E, squared, over E, summed."

Chi-Square Test of Independence

  • Primary Rule: Determine if two categorical variables are independent.
  • Formula: [ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} ] where (O_{ij}) is the observed frequency in cell (ij) and (E_{ij}) is the expected frequency.
  • Degrees of Freedom: ((r - 1) \times (c - 1)) (where (r) is the number of rows and (c) is the number of columns).
  • Mnemonic: "Row times column, minus one, times minus one."

Exam / Job / Audit Weighting

  • Frequency: High
  • Difficulty Rating: Intermediate
  • Question Type or Real-World Task Type: Multiple-choice, short answer, data analysis tasks

Difficulty Level

Intermediate

Must-Know Rules, Formulas, Standards, or Principles

  1. Chi-Square Formula: [ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} ]
  2. Degrees of Freedom:
  3. Goodness-of-Fit: (k - 1)
  4. Test of Independence: ((r - 1) \times (c - 1))
  5. p-value Interpretation:
  6. If (p < 0.05), reject the null hypothesis.

Worked Examples (Step-by-Step)

Easy

Question: A company claims that their product has equal market share among three regions. The observed frequencies are 50, 70, and 80. Test this claim at a 5% significance level.

Step-by-Step:
1. Null Hypothesis: Equal market share (33.33% each).
2. Expected Frequencies: [ E_1 = E_2 = E_3 = \frac{200}{3} \approx 66.67 ]
3. Chi-Square Calculation: [ \chi^2 = \frac{(50 - 66.67)^2}{66.67} + \frac{(70 - 66.67)^2}{66.67} + \frac{(80 - 66.67)^2}{66.67} \approx 6.67 ]
4. Degrees of Freedom: (3 - 1 = 2).
5. p-value: Using Chi-Square table, (p < 0.05).

Answer: Reject the null hypothesis.

Medium

Question: A survey asks 100 people about their preference for two brands (A and B) and two age groups (Young and Old). The observed frequencies are:

Brand A Brand B
Young 30 20
Old 20 30

Test if brand preference is independent of age group at a 5% significance level.

Step-by-Step:
1. Null Hypothesis: Brand preference is independent of age group.
2. Expected Frequencies: [ E_{11} = \frac{50 \times 50}{100} = 25, \quad E_{12} = \frac{50 \times 50}{100} = 25 ] [ E_{21} = \frac{50 \times 50}{100} = 25, \quad E_{22} = \frac{50 \times 50}{100} = 25 ]
3. Chi-Square Calculation: [ \chi^2 = \frac{(30 - 25)^2}{25} + \frac{(20 - 25)^2}{25} + \frac{(20 - 25)^2}{25} + \frac{(30 - 25)^2}{25} = 4 ]
4. Degrees of Freedom: ((2 - 1) \times (2 - 1) = 1).
5. p-value: Using Chi-Square table, (p > 0.05).

Answer: Fail to reject the null hypothesis.

Hard

Question: A study examines the relationship between education level (High School, College, Graduate) and job satisfaction (Satisfied, Neutral, Dissatisfied). The observed frequencies are:

Satisfied Neutral Dissatisfied
High School 20 15 10
College 30 25 15
Graduate 40 30 20

Test if job satisfaction is independent of education level at a 5% significance level.

Step-by-Step:
1. Null Hypothesis: Job satisfaction is independent of education level.
2. Expected Frequencies: [ E_{11} = \frac{45 \times 90}{180} = 22.5, \quad E_{12} = \frac{45 \times 70}{180} = 17.5, \quad E_{13} = \frac{45 \times 20}{180} = 5 ] [ E_{21} = \frac{70 \times 90}{180} = 35, \quad E_{22} = \frac{70 \times 70}{180} = 27.78, \quad E_{23} = \frac{70 \times 20}{180} = 7.22 ] [ E_{31} = \frac{65 \times 90}{180} = 32.5, \quad E_{32} = \frac{65 \times 70}{180} = 25.44, \quad E_{33} = \frac{65 \times 20}{180} = 7.06 ]
3. Chi-Square Calculation: [ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \approx 4.56 ]
4. Degrees of Freedom: ((3 - 1) \times (3 - 1) = 4).
5. p-value: Using Chi-Square table, (p > 0.05).

Answer: Fail to reject the null hypothesis.

Common Exam Traps & Mistakes

  1. Mistake: Forgetting to check expected frequencies.
  2. Wrong Answer: Assuming all expected frequencies are valid.
  3. Correct Approach: Ensure all expected frequencies are greater than 5.

  4. Mistake: Incorrect degrees of freedom calculation.

  5. Wrong Answer: Using (r \times c) instead of ((r - 1) \times (c - 1)).
  6. Correct Approach: Always subtract 1 from rows and columns.

  7. Mistake: Misinterpreting p-value.

  8. Wrong Answer: Rejecting the null hypothesis when (p > 0.05).
  9. Correct Approach: Reject the null hypothesis only if (p < 0.05).

  10. Mistake: Not summing chi-square values correctly.

  11. Wrong Answer: Summing only a subset of chi-square values.
  12. Correct Approach: Sum all chi-square values for each cell.

Shortcut Strategies & Exam Hacks

  1. Memory Aid: "O minus E, squared, over E, summed" for Chi-Square formula.
  2. Elimination Strategy: If expected frequencies are less than 5, eliminate that option.
  3. Pattern Recognition: Look for equal expected frequencies in Goodness-of-Fit tests.
  4. Formula Shortcut: Use ((r - 1) \times (c - 1)) for degrees of freedom in Test of Independence.

Question-Type Taxonomy

  1. Multiple-Choice:
  2. Mini-Example: Which of the following is the correct formula for the Chi-Square Test?

    • Favored By: GRE, GMAT
  3. Short Answer:

  4. Mini-Example: Calculate the Chi-Square statistic for the given data.

    • Favored By: University exams, AP Statistics
  5. Data Analysis:

  6. Mini-Example: Analyze the given dataset and determine if the variables are independent.
    • Favored By: Research methods courses, job interviews

Practice Set (MCQs)

Question 1

Question: What is the formula for the Chi-Square Test? Options: A) (\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}) B) (\chi^2 = \sum \frac{(O_i - E_i)^2}{O_i}) C) (\chi^2 = \sum \frac{(O_i - E_i)}{E_i}) D) (\chi^2 = \sum \frac{(O_i - E_i)}{O_i})

Correct Answer: A) (\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i})

Explanation: The Chi-Square formula involves squaring the difference between observed and expected frequencies and dividing by the expected frequency.

Why the Distractors Are Tempting: - B) Incorrectly uses observed frequency in the denominator. - C) Forgets to square the difference. - D) Incorrectly uses observed frequency in the denominator and forgets to square the difference.

Question 2

Question: What are the degrees of freedom for a Chi-Square Goodness-of-Fit Test with 4 categories? Options: A) 3 B) 4 C) 5 D) 6

Correct Answer: A) 3

Explanation: Degrees of freedom for Goodness-of-Fit Test is (k - 1), where (k) is the number of categories.

Why the Distractors Are Tempting: - B) Incorrectly assumes degrees of freedom is equal to the number of categories. - C) Incorrectly adds 1 to the number of categories. - D) Incorrectly assumes degrees of freedom is one more than the number of categories.

Question 3

Question: In a Chi-Square Test of Independence with 3 rows and 4 columns, what are the degrees of freedom? Options: A) 6 B) 7 C) 9 D) 12

Correct Answer: A) 6

Explanation: Degrees of freedom for Test of Independence is ((r - 1) \times (c - 1)), where (r) is the number of rows and (c) is the number of columns.

Why the Distractors Are Tempting: - B) Incorrectly assumes degrees of freedom is (r \times c - 1). - C) Incorrectly assumes degrees of freedom is (r \times c). - D) Incorrectly assumes degrees of freedom is (r \times c + 1).

Question 4

Question: If the p-value in a Chi-Square Test is 0.03, what should you conclude? Options: A) Reject the null hypothesis B) Fail to reject the null hypothesis C) The test is inconclusive D) The null hypothesis is true

Correct Answer: A) Reject the null hypothesis

Explanation: If (p < 0.05), you reject the null hypothesis.

Why the Distractors Are Tempting: - B) Incorrectly assumes (p < 0.05) means failing to reject the null hypothesis. - C) Incorrectly assumes the test is inconclusive. - D) Incorrectly assumes the null hypothesis is true.

Question 5

Question: Which of the following is NOT a step in the Chi-Square Goodness-of-Fit Test? Options: A) Calculate observed frequencies B) Calculate expected frequencies C) Calculate the Chi-Square statistic D) Calculate the correlation coefficient

Correct Answer: D) Calculate the correlation coefficient

Explanation: The correlation coefficient is not part of the Chi-Square Goodness-of-Fit Test.

Why the Distractors Are Tempting: - A) Correct step in the test. - B) Correct step in the test. - C) Correct step in the test.

30-Second Cheat Sheet

  • Chi-Square Formula: (\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i})
  • Degrees of Freedom:
  • Goodness-of-Fit: (k - 1)
  • Test of Independence: ((r - 1) \times (c - 1))
  • p-value Interpretation:
  • If (p < 0.05), reject the null hypothesis.
  • Expected Frequencies: Must be greater than 5.
  • Mnemonic: "O minus E, squared, over E, summed."

Learning Path

  1. Beginner Foundation:
  2. Review basic probability and hypothesis testing.
  3. Understand frequency distributions and cross-tabulations.

  4. Core Rules:

  5. Learn the Chi-Square formula and degrees of freedom calculations.
  6. Practice interpreting p-values.

  7. Practice:

  8. Solve example problems step-by-step.
  9. Work through multiple-choice questions.

  10. Timed Drills:

  11. Complete practice tests under exam conditions.
  12. Focus on speed and accuracy.

  13. Mock Tests:

  14. Take full-length mock exams.
  15. Review mistakes and reinforce correct approaches.

Related Topics

  1. ANOVA: Used to compare means across multiple groups; often appears alongside Chi-Square Tests in exams.
  2. t-Tests: Used to compare means between two groups; complements Chi-Square Tests in hypothesis testing.
  3. Correlation and Regression: Used to analyze relationships between continuous variables; often tested in conjunction with Chi-Square Tests for categorical data.