Fatskills
Practice. Master. Repeat.
Study Guide: AP Exams: Statistics Unit 7, Chi-Square, Test of Independence and Homogeneity, Two-Way Tables
Source: https://www.fatskills.com/ap/chapter/ap-exams-statistics-unit-7-chi-square-chi-square-test-of-independence-and-homogeneity-two-way-tables

AP Exams: Statistics Unit 7, Chi-Square, Test of Independence and Homogeneity, Two-Way Tables

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~9 min read

What Is This?

The Chi-Square test of independence and homogeneity, also known as the Chi-Square test, is a statistical method used to determine whether there is a significant association between two categorical variables in a two-way table. This test is used to assess whether the observed frequencies in a contingency table differ significantly from the expected frequencies under the assumption of no association.

This topic appears in exams to test your understanding of statistical analysis, data interpretation, and research methodology.

Why It Matters

The Chi-Square test is a crucial topic in statistics, and it is frequently tested in exams, particularly in social sciences, health sciences, and business studies. The marks allocated to this topic vary, but it can account for up to 20% of the total marks in a statistics exam. The examiner is testing your ability to apply statistical concepts to real-world problems, interpret results, and draw conclusions.

Core Concepts

To tackle Chi-Square questions, you must understand the following core concepts:

  • Independence: The assumption that the variables in the two-way table are independent, meaning that the occurrence of one variable does not affect the occurrence of the other.
  • Homogeneity: The assumption that the variables in the two-way table are homogeneous, meaning that the distribution of the variables is the same across all categories.
  • Expected frequency: The frequency that would be expected under the assumption of no association between the variables.
  • Observed frequency: The actual frequency observed in the data.
  • Chi-Square statistic: A measure of the difference between the observed and expected frequencies.

Prerequisites

Before tackling the Chi-Square test, you must already understand:

  • Basic statistical concepts, such as mean, median, and mode
  • Data analysis and interpretation
  • Research methodology and study design

If you are missing these prerequisites, you may struggle to understand the underlying logic of the Chi-Square test.

The Rule-Book (How It Works)

The primary rule of the Chi-Square test is:

  • If the observed frequencies in a two-way table differ significantly from the expected frequencies under the assumption of no association, the null hypothesis of independence or homogeneity is rejected.

Sub-rules and exceptions:

  • The Chi-Square test assumes that the data are categorical and that the variables are independent or homogeneous.
  • The test is sensitive to sample size and may produce misleading results if the sample size is small.
  • The test is not suitable for large sample sizes, as the chi-square distribution may not be a good approximation.

A simple visual pattern to remember:

| | Category 1 | Category 2 | ... | Category n | | --- | --- | --- | ... | --- | | Category 1 | | | ... | | | Category 2 | | | ... | | | ... | | | ... | | | Category n | | | ... | |

Exam / Job / Audit Weighting

Frequency: 20-30% Difficulty Rating: Intermediate Question Type or Real-World Task Type: Multiple-choice questions, short-answer questions, and case studies

Difficulty Level

Intermediate

Must-Know Rules, Formulas, Standards, or Principles

The following three rules and formulas are essential for the Chi-Square test:

  1. Chi-Square statistic: ?² =-[(observed frequency - expected frequency)² / expected frequency]
  2. Null hypothesis: H0: The variables are independent or homogeneous
  3. Alternative hypothesis: H1: The variables are not independent or homogeneous

Worked Examples (Step-by-Step)

Example 1: Easy

A researcher wants to determine whether there is a significant association between the type of exercise (running, swimming, or cycling) and the level of fitness (high, medium, or low). The data are presented in the following contingency table:

High Fitness Medium Fitness Low Fitness
Running 20 15 5
Swimming 10 20 10
Cycling 5 15 20

Using the Chi-Square test, determine whether there is a significant association between the type of exercise and the level of fitness.

Solution

  • Calculate the expected frequencies under the assumption of no association: E = (row total × column total) / grand total
  • Calculate the Chi-Square statistic: ?² =-[(observed frequency - expected frequency)² / expected frequency]
  • Determine the p-value and compare it to the significance level (? = 0.05)
  • Reject the null hypothesis if the p-value is less than ?

Example 2: Medium

A researcher wants to determine whether there is a significant association between the type of medication (drug A, drug B, or placebo) and the response rate (positive, negative, or neutral). The data are presented in the following contingency table:

Positive Response Negative Response Neutral Response
Drug A 25 15 10
Drug B 20 20 20
Placebo 10 25 15

Using the Chi-Square test, determine whether there is a significant association between the type of medication and the response rate.

Solution

  • Calculate the expected frequencies under the assumption of no association: E = (row total × column total) / grand total
  • Calculate the Chi-Square statistic: ?² =-[(observed frequency - expected frequency)² / expected frequency]
  • Determine the p-value and compare it to the significance level (? = 0.05)
  • Reject the null hypothesis if the p-value is less than ?

Example 3: Hard

A researcher wants to determine whether there is a significant association between the type of exercise (running, swimming, or cycling) and the level of fitness (high, medium, or low) in a population of 1000 individuals. The data are presented in the following contingency table:

High Fitness Medium Fitness Low Fitness
Running 200 150 50
Swimming 100 200 100
Cycling 50 150 200

Using the Chi-Square test, determine whether there is a significant association between the type of exercise and the level of fitness.

Solution

  • Calculate the expected frequencies under the assumption of no association: E = (row total × column total) / grand total
  • Calculate the Chi-Square statistic: ?² =-[(observed frequency - expected frequency)² / expected frequency]
  • Determine the p-value and compare it to the significance level (? = 0.05)
  • Reject the null hypothesis if the p-value is less than ?

Common Exam Traps & Mistakes

Trap 1: Incorrect calculation of expected frequencies

  • Mistake: Calculating the expected frequencies using the wrong formula or incorrect values.
  • Wrong answer: ?² = 10 (instead of ?² = 5)
  • Correct approach: Use the formula E = (row total × column total) / grand total to calculate the expected frequencies.

Trap 2: Failure to reject the null hypothesis

  • Mistake: Failing to reject the null hypothesis even when the p-value is less than ?.
  • Wrong answer: ?² = 5, p-value = 0.01-fail to reject H0
  • Correct approach: Reject the null hypothesis if the p-value is less than ?.

Trap 3: Incorrect interpretation of the Chi-Square statistic

  • Mistake: Interpreting the Chi-Square statistic as a measure of effect size.
  • Wrong answer: ?² = 10-the variables are strongly associated.
  • Correct approach: The Chi-Square statistic measures the difference between the observed and expected frequencies, not the strength of the association.

Trap 4: Failure to check the assumptions of the Chi-Square test

  • Mistake: Failing to check the assumptions of the Chi-Square test, such as independence and homogeneity.
  • Wrong answer: ?² = 5, p-value = 0.01-reject H0
  • Correct approach: Check the assumptions of the Chi-Square test before interpreting the results.

Trap 5: Incorrect calculation of the p-value

  • Mistake: Calculating the p-value using the wrong formula or incorrect values.
  • Wrong answer: p-value = 0.05 (instead of p-value = 0.01)
  • Correct approach: Use the formula p-value = P(?²-?²_obs) to calculate the p-value.

Shortcut Strategies & Exam Hacks

Hack 1: Use a calculator to calculate the Chi-Square statistic and p-value

  • Time-saving: Using a calculator to calculate the Chi-Square statistic and p-value can save you time and reduce errors.

Hack 2: Use a table to determine the critical value of the Chi-Square distribution

  • Time-saving: Using a table to determine the critical value of the Chi-Square distribution can save you time and reduce errors.

Hack 3: Use a formula to calculate the expected frequencies

  • Time-saving: Using a formula to calculate the expected frequencies can save you time and reduce errors.

Question-Type Taxonomy

The Chi-Square test appears in the following question formats:

Question Format Example Exams that favor it
Multiple-choice question What is the correct formula for the Chi-Square statistic? Statistics exams
Short-answer question Calculate the expected frequencies for the following contingency table: Research methodology exams
Case study A researcher wants to determine whether there is a significant association between the type of exercise and the level of fitness. Health sciences exams

Practice Set (MCQs)

Question 1

What is the correct formula for the Chi-Square statistic?

A) ?² =-[(observed frequency - expected frequency)² / expected frequency] B) ?² =-[(observed frequency + expected frequency)² / expected frequency] C) ?² =-[(observed frequency × expected frequency) / expected frequency] D) ?² =-[(observed frequency - expected frequency) / expected frequency]

Correct answer: A) ?² =-[(observed frequency - expected frequency)² / expected frequency]

Question 2

A researcher wants to determine whether there is a significant association between the type of exercise and the level of fitness. The data are presented in the following contingency table:

High Fitness Medium Fitness Low Fitness
Running 20 15 5
Swimming 10 20 10
Cycling 5 15 20

Using the Chi-Square test, determine whether there is a significant association between the type of exercise and the level of fitness.

A) Yes, there is a significant association B) No, there is no significant association C) The data are insufficient to determine the association D) The association is not significant at the 5% level

Correct answer: A) Yes, there is a significant association

Question 3

A researcher wants to determine whether there is a significant association between the type of medication and the response rate. The data are presented in the following contingency table:

Positive Response Negative Response Neutral Response
Drug A 25 15 10
Drug B 20 20 20
Placebo 10 25 15

Using the Chi-Square test, determine whether there is a significant association between the type of medication and the response rate.

A) Yes, there is a significant association B) No, there is no significant association C) The data are insufficient to determine the association D) The association is not significant at the 5% level

Correct answer: A) Yes, there is a significant association

Question 4

A researcher wants to determine whether there is a significant association between the type of exercise and the level of fitness in a population of 1000 individuals. The data are presented in the following contingency table:

High Fitness Medium Fitness Low Fitness
Running 200 150 50
Swimming 100 200 100
Cycling 50 150 200

Using the Chi-Square test, determine whether there is a significant association between the type of exercise and the level of fitness.

A) Yes, there is a significant association B) No, there is no significant association C) The data are insufficient to determine the association D) The association is not significant at the 5% level

Correct answer: A) Yes, there is a significant association

Question 5

What is the correct interpretation of the Chi-Square statistic?

A) The Chi-Square statistic measures the strength of the association between the variables. B) The Chi-Square statistic measures the difference between the observed and expected frequencies. C) The Chi-Square statistic measures the effect size of the association. D) The Chi-Square statistic measures the p-value of the association.

Correct answer: B) The Chi-Square statistic measures the difference between the observed and expected frequencies.

30-Second Cheat Sheet

  • Chi-Square statistic: ?² =-[(observed frequency - expected frequency)² / expected frequency]
  • Expected frequency: E = (row total × column total) / grand total
  • Null hypothesis: H0: The variables are independent or homogeneous
  • Alternative hypothesis: H1: The variables are not independent or homogeneous
  • p-value: p-value = P(?²-?²_obs)
  • Significance level:-= 0.05

Learning Path

  1. Beginner foundation: Understand the basics of statistics, including mean, median, and mode.
  2. Core rules: Learn the rules and formulas for the Chi-Square test, including the calculation of the Chi-Square statistic and p-value.
  3. Practice: Practice calculating the Chi-Square statistic and p-value using sample data.
  4. Timed drills: Practice solving Chi-Square questions under timed conditions.
  5. Mock tests: Take mock tests to assess your knowledge and identify areas for improvement.

Related Topics

  • Regression analysis: Regression analysis is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
  • Hypothesis testing: Hypothesis testing is a statistical method used to test a hypothesis about a population parameter.
  • Confidence intervals: Confidence intervals are a statistical method used to estimate a population parameter.