By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
For Data Scientists who need to validate assumptions, debug experiments, and ship statistically sound models.
Hypothesis testing is how you prove (or disprove) assumptions about your data. Think of it like a courtroom trial for your model’s predictions: - Null Hypothesis (H₀): "The defendant (your model’s assumption) is innocent (correct)." - Alternative Hypothesis (H₁): "The defendant is guilty (wrong)." - p-value: The probability of seeing your data if H₀ were true. If p < 0.05, you "reject H₀" (guilty verdict).
p < 0.05
Why this matters in production:- A/B tests: Did your new recommendation algorithm actually improve click-through rates, or was it luck? - Feature selection: Does this new feature statistically improve model accuracy, or is it noise? - Data drift: Is today’s customer behavior significantly different from last month’s? (If yes, retrain your model.) - Regulatory compliance: If you’re in healthcare/finance, you must prove your model’s decisions aren’t biased (e.g., chi-square for fairness testing).
Real-world scenario:You’re a DS at an e-commerce company. Your team launches a new checkout UI, and conversion rates look higher. But your boss asks: "Is this a real improvement, or just random noise? Should we roll it out to all users?" Hypothesis testing gives you the answer.
0.05
α = 0.05
df = n₁ + n₂ - 2
df = (rows-1)*(cols-1)
scipy
python --version
pandas
bash pip install scipy pandas numpy matplotlib
Goal: Determine if the new UI statistically improves conversion rates.
import pandas as pd import numpy as np from scipy import stats # Load synthetic A/B test data (conversion = 1 if purchased, 0 otherwise) data = pd.read_csv("ab_test_data.csv") # Columns: user_id, group (control/treatment), conversion print(data.head()) print("\nGroup sizes:", data["group"].value_counts())
Expected output:
user_id group conversion 0 1 treatment 1 1 2 control 0 2 3 treatment 0 3 4 control 1 4 5 treatment 1 Group sizes: treatment 5000 control 5000
# Check normality (for t-test) control = data[data["group"] == "control"]["conversion"] treatment = data[data["group"] == "treatment"]["conversion"] # Plot distributions (optional) import matplotlib.pyplot as plt plt.hist(control, alpha=0.5, label="Control") plt.hist(treatment, alpha=0.5, label="Treatment") plt.legend() plt.show() # Check variance equality (Levene's test) levene_stat, levene_p = stats.levene(control, treatment) print(f"Levene's test p-value: {levene_p:.4f}") # If p > 0.05, variances are equal
Output:
Levene's test p-value: 0.1234 # Variances are equal (use standard t-test)
Option A: t-test (if data is continuous)
t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=True) print(f"t-statistic: {t_stat:.3f}, p-value: {p_value:.4f}")
t-statistic: 2.867, p-value: 0.0042 # Reject H₀: new UI improves conversions!
Option B: Chi-square (if data is categorical)
# Create contingency table contingency_table = pd.crosstab(data["group"], data["conversion"]) chi2_stat, p_value, dof, expected = stats.chi2_contingency(contingency_table) print(f"Chi-square statistic: {chi2_stat:.3f}, p-value: {p_value:.4f}")
Chi-square statistic: 8.212, p-value: 0.0042 # Same conclusion!
python mean_diff = treatment.mean() - control.mean() pooled_std = np.sqrt((treatment.std()2 + control.std()2) / 2) cohen_d = mean_diff / pooled_std print(f"Cohen's d: {cohen_d:.3f}") # 0.1 = small, 0.3 = medium, 0.5 = large
Cohen's d: 0.127 # Small effect, but statistically significant
Template for stakeholders:
"The new checkout UI increased conversion rates from 12.3% to 13.8% (p = 0.0042, Cohen’s d = 0.13). While the effect is small, it is statistically significant. We recommend rolling out the new UI to all users."
α = 0.05/10 = 0.005
statsmodels.stats.power
python def run_ab_test(control, treatment, test_type="t"): if test_type == "t": return stats.ttest_ind(treatment, control, equal_var=True) elif test_type == "chi2": contingency = pd.crosstab(control, treatment) return stats.chi2_contingency(contingency)
pingouin
bash pip install pingouin
Typical question patterns:1. Interpret a p-value: "A t-test returns p = 0.03. What does this mean?" - ❌ "The null hypothesis is false." - ✅ "There’s a 3% chance of observing this data if the null hypothesis were true."
❌ Chi-square (for categorical data).
Effect size vs. p-value: "A test has p = 0.01 and Cohen’s d = 0.02. What’s the takeaway?"
✅ "Statistically significant but practically negligible."
Chi-square assumptions: "When can’t you use a chi-square test?"
Key trap distinctions:- t-test vs. chi-square: - t-test: Continuous data, compares means. - Chi-square: Categorical data, tests independence.- Independent vs. paired t-test: - Independent: Two separate groups (e.g., control vs. treatment). - Paired: Same group before/after (e.g., pre/post-treatment).
Challenge:You’re given a dataset of customer satisfaction scores (1-5) for two product versions. Run a test to determine if Version B is statistically better than Version A.
Data:
import pandas as pd data = pd.DataFrame({ "version": ["A"]*100 + ["B"]*100, "score": np.concatenate([np.random.normal(3.5, 1, 100), np.random.normal(3.7, 1, 100)]) })
Solution:
a_scores = data[data["version"] == "A"]["score"] b_scores = data[data["version"] == "B"]["score"] # Check normality (Shapiro-Wilk test) print("Shapiro-Wilk p-values:", stats.shapiro(a_scores).pvalue, stats.shapiro(b_scores).pvalue) # Run t-test (assuming normality) t_stat, p_value = stats.ttest_ind(b_scores, a_scores, equal_var=True) print(f"t-statistic: {t_stat:.3f}, p-value: {p_value:.4f}") # Effect size (Cohen's d) mean_diff = b_scores.mean() - a_scores.mean() pooled_std = np.sqrt((a_scores.std()2 + b_scores.std()2) / 2) cohen_d = mean_diff / pooled_std print(f"Cohen's d: {cohen_d:.3f}")
Why it works:- Shapiro-Wilk checks normality (p > 0.05 → normal).- t-test compares means of two independent groups.- Cohen’s d quantifies the effect size.
stats.ttest_ind(group1, group2, equal_var=True)
equal_var=False
stats.ttest_rel(before, after)
stats.chi2_contingency(pd.crosstab(group, outcome))
stats.mannwhitneyu(group1, group2)
stats.shapiro(data)
stats.levene(group1, group2)
(mean1 - mean2) / pooled_std
α = 0.05 / n_tests
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
analysis.solve_power(effect_size=0.5, nobs1=None, alpha=0.05, power=0.8)
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.