Fatskills
Practice. Master. Repeat.
Study Guide: Business Analytics 101: Ethics and Privacy in Analytics - Bias and Fairness in Algorithms Disparate Impact Protected Attributes Fairness Metrics
Source: https://www.fatskills.com/business-analytics/chapter/business-analytics-busanalytics-ethics-and-privacy-in-analytics-bias-and-fairness-in-algorithms-disparate-impact-protected-attributes-fairness-metrics

Business Analytics 101: Ethics and Privacy in Analytics - Bias and Fairness in Algorithms Disparate Impact Protected Attributes Fairness Metrics

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~4 min read

What This Is

Bias and fairness in algorithms refer to the unintended consequences of machine learning models that can lead to discriminatory outcomes. This is crucial in business analytics as it can result in unequal treatment of customers, employees, or patients, ultimately affecting the bottom line and reputation of the organization. For instance, a company using a credit scoring model that unfairly discriminates against certain demographics may lose customers and face regulatory issues.

Key Formulas & Metrics

  • Disparate Impact (DI) = (|P(Y=1|X=1) - P(Y=1|X=0)|) / P(Y=1|X=0) – measures the difference in treatment between two groups.
  • P(Y=1|X=1): probability of a positive outcome (e.g., loan approval) given a protected attribute (e.g., race).
  • P(Y=1|X=0): probability of a positive outcome given the absence of the protected attribute.
  • |P(Y=1|X=1) - P(Y=1|X=0)|: absolute difference between the two probabilities.
  • P(Y=1|X=0): denominator, used to normalize the difference.
  • Fairness Metric (FM) = 1 - (|P(Y=1|X=1) - P(Y=1|X=0)|) – measures the fairness of a model.
  • Equal Opportunity Difference (EOD) = |P(Y=1|X=1) - P(Y=1|X=0)| – measures the difference in treatment between two groups.
  • Predictive Rate Parity (PRP) = |P(Y=1|X=1) - P(Y=1|X=0)| – measures the difference in predictive accuracy between two groups.
  • Demographic Parity (DP) = P(Y=1|X=1) = P(Y=1|X=0) – measures the equality of treatment between two groups.
  • Statistical Parity (SP) = P(Y=1|X=1) = P(Y=1|X=0) – measures the equality of treatment between two groups.
  • Equalized Odds (EO) = P(Y=1|X=1) = P(Y=1|X=0) – measures the equality of treatment between two groups.

Step-by-Step Procedure

  1. Identify protected attributes: Determine the attributes that are protected by law or regulations (e.g., race, gender, age).
  2. Collect and preprocess data: Gather data on the protected attributes and the outcome variable.
  3. Train a model: Train a machine learning model on the data.
  4. Evaluate fairness metrics: Calculate the disparate impact, fairness metric, equal opportunity difference, predictive rate parity, demographic parity, statistical parity, and equalized odds.
  5. Interpret results: Interpret the results and identify areas where the model may be biased.
  6. Mitigate bias: Implement techniques to mitigate bias, such as data preprocessing, feature engineering, or model selection.

Common Mistakes

  • Mistake: Confusing disparate impact with fairness metric.
  • Correction: Disparate impact measures the difference in treatment between two groups, while fairness metric measures the fairness of a model.
  • Mistake: Failing to account for confounding variables.
  • Correction: Confounding variables can affect the relationship between the protected attribute and the outcome variable.
  • Mistake: Using a single fairness metric.
  • Correction: Use multiple fairness metrics to get a comprehensive understanding of the model's fairness.

Software / Tool Tips

  • Python with scikit-learn: Use the sklearn.metrics module to calculate fairness metrics.
  • R: Use the fairness package to calculate fairness metrics.
  • Tableau: Use the Fairness dashboard to visualize fairness metrics.

Quick Practice Problem

A company wants to predict the likelihood of a customer buying a product based on their demographic information. The model has a disparate impact of 0.2, which means that the model is 20% more likely to predict a purchase for customers from a certain demographic. What does this mean?

Answer: This means that the model is biased towards customers from a certain demographic.

Explanation: The disparate impact measures the difference in treatment between two groups, in this case, the difference in predictive accuracy between customers from different demographics.

Last-Minute Cram Sheet

  • Disparate impact (DI) measures the difference in treatment between two groups.
  • Fairness metric (FM) measures the fairness of a model.
  • Equal opportunity difference (EOD) measures the difference in treatment between two groups.
  • Predictive rate parity (PRP) measures the difference in predictive accuracy between two groups.
  • Demographic parity (DP) measures the equality of treatment between two groups.
  • Statistical parity (SP) measures the equality of treatment between two groups.
  • Equalized odds (EO) measures the equality of treatment between two groups.
  • Protected attributes are attributes that are protected by law or regulations.
  • Confounding variables can affect the relationship between the protected attribute and the outcome variable.
  • Multiple fairness metrics should be used to get a comprehensive understanding of the model's fairness.
  • p-value is NOT the probability that H? is true – it’s the probability of observing the data (or more extreme) if H? is true.