Fatskills
Practice. Master. Repeat.
Study Guide: TECH **SQL for Data Analysis: UNION, UNION ALL, INTERSECT, EXCEPT – Zero-Fluff Study Guide**
Source: https://www.fatskills.com/introdution-to-engineering/chapter/tech-sql-for-data-analysis-union-union-all-intersect-except-zero-fluff-study-guide

TECH **SQL for Data Analysis: UNION, UNION ALL, INTERSECT, EXCEPT – Zero-Fluff Study Guide**

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~9 min read

SQL for Data Analysis: UNION, UNION ALL, INTERSECT, EXCEPT – Zero-Fluff Study Guide


1. What This Is & Why It Matters

You’re analyzing customer data from two different systems—one for online sales (ecommerce_sales) and another for in-store purchases (retail_sales). Your boss asks: "Give me a single list of all unique customers who made a purchase in the last 30 days, regardless of channel. Then, show me which customers bought online but never in-store."

This is where UNION, UNION ALL, INTERSECT, and EXCEPT come in. These set operators let you combine, compare, or filter results from multiple queries—something JOIN can’t do efficiently (or at all).

Why this matters in production:
- Avoid duplicate work: Instead of exporting data to Python/Pandas, you can merge datasets directly in SQL.
- Performance: Set operations are optimized in most databases (PostgreSQL, SQL Server, BigQuery, etc.).
- Data quality checks: Quickly find mismatches between datasets (e.g., "Which products exist in our catalog but never sold?").
- ETL pipelines: Clean and deduplicate data before loading into a data warehouse.

What breaks if you ignore this?
- You’ll write messy, slow subqueries or resort to manual Excel merges.
- Your reports will double-count records (if you use UNION ALL when you need UNION).
- You’ll miss critical data gaps (e.g., customers who churned in one system but not another).


2. Core Concepts & Components


? UNION

  • Definition: Combines results from two queries, removes duplicates, and returns distinct rows.
  • Production insight: Use when you need a deduplicated list (e.g., all unique customers across regions). Slower than UNION ALL because it sorts and deduplicates.
  • Syntax:
    sql SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2;

? UNION ALL

  • Definition: Combines results from two queries including duplicates.
  • Production insight: Faster than UNION (no deduplication). Use when duplicates don’t matter (e.g., appending logs) or when you’ll deduplicate later.
  • Syntax:
    sql SELECT column1, column2 FROM table1 UNION ALL SELECT column1, column2 FROM table2;

? INTERSECT

  • Definition: Returns only rows that exist in both queries.
  • Production insight: Useful for finding overlaps (e.g., "Which customers bought in both Q1 and Q2?"). Not supported in MySQL (use INNER JOIN or EXISTS instead).
  • Syntax:
    sql SELECT column1 FROM table1 INTERSECT SELECT column1 FROM table2;

? EXCEPT (or MINUS in Oracle)

  • Definition: Returns rows from the first query that don’t exist in the second query.
  • Production insight: Critical for data validation (e.g., "Which products are in our catalog but never sold?"). In PostgreSQL, it’s EXCEPT; in Oracle, it’s MINUS.
  • Syntax:
    sql SELECT column1 FROM table1 EXCEPT SELECT column1 FROM table2;

? Key Rules for All Set Operators

  1. Same number of columns: Queries must return the same number of columns.
  2. Compatible data types: Columns must have matching or convertible types (e.g., INT and BIGINT work; INT and VARCHAR may not).
  3. Column names: The first query’s column names are used in the result.
  4. Order of operations: UNION/INTERSECT/EXCEPT are evaluated left-to-right unless parentheses are used.
  5. Performance: UNION ALL > UNION > INTERSECT > EXCEPT (in terms of speed).

3. Step-by-Step Hands-On


Prerequisites

  • A SQL database (PostgreSQL, SQL Server, BigQuery, etc.). For this guide, we’ll use PostgreSQL (free via ElephantSQL or Docker).
  • Sample data: Two tables, online_sales and instore_sales, with columns customer_id, product_id, purchase_date.

Task: Analyze Cross-Channel Customer Behavior

You need to: 1. List all unique customers who made a purchase in the last 30 days (deduplicated).
2. Find customers who bought online but never in-store.
3. Find customers who bought in both channels.


Step 1: Create Sample Tables

-- Create tables
CREATE TABLE online_sales (
customer_id INT,
product_id INT,
purchase_date DATE ); CREATE TABLE instore_sales (
customer_id INT,
product_id INT,
purchase_date DATE ); -- Insert sample data INSERT INTO online_sales VALUES (1, 101, '2023-10-01'), (2, 102, '2023-10-02'), (3, 103, '2023-10-03'), (1, 104, '2023-10-04'); -- Customer 1 bought twice online INSERT INTO instore_sales VALUES (1, 201, '2023-10-01'), -- Customer 1 also bought in-store (2, 202, '2023-10-05'), (4, 203, '2023-10-06'); -- Customer 4 only bought in-store


Step 2: Get All Unique Customers (UNION)

SELECT customer_id FROM online_sales
WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days'
UNION
SELECT customer_id FROM instore_sales
WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY customer_id;

Output:


customer_id
-----------
1
2
3
4

Why this works: UNION removes duplicates (e.g., customer 1 appears in both tables but only once in the result).


Step 3: Get All Customers Including Duplicates (UNION ALL)

SELECT customer_id FROM online_sales
WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days'
UNION ALL
SELECT customer_id FROM instore_sales
WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY customer_id;

Output:


customer_id
-----------
1
1
2
2
3
4

Why this works: UNION ALL keeps duplicates (customer 1 appears twice because they bought in both channels).


Step 4: Find Customers Who Bought in Both Channels (INTERSECT)

SELECT customer_id FROM online_sales
INTERSECT
SELECT customer_id FROM instore_sales;

Output:


customer_id
-----------
1
2

Why this works: Only customers present in both tables are returned.


Step 5: Find Customers Who Bought Online but Not In-Store (EXCEPT)

SELECT customer_id FROM online_sales
EXCEPT
SELECT customer_id FROM instore_sales;

Output:


customer_id
-----------
3

Why this works: Returns customers from the first query (online_sales) that don’t exist in the second (instore_sales).


Step 6: Verify with a Real-World Check

Run this to confirm your results:


-- Count unique customers per channel
SELECT
COUNT(DISTINCT customer_id) AS online_customers,
COUNT(DISTINCT customer_id) AS instore_customers FROM online_sales FULL OUTER JOIN instore_sales USING (customer_id); -- Compare with UNION/INTERSECT/EXCEPT results

Expected: The counts should align with your earlier queries.


4. ? Production-Ready Best Practices


Performance

  • Use UNION ALL instead of UNION unless you need deduplication. UNION sorts and removes duplicates, which is slow.
  • Filter early: Apply WHERE clauses before set operations to reduce data volume.
    ```sql -- ❌ Slow: Filter after UNION (SELECT * FROM table1 UNION SELECT * FROM table2) WHERE date > '2023-01-01';

-- ✅ Fast: Filter before UNION (SELECT * FROM table1 WHERE date > '2023-01-01') UNION ALL (SELECT * FROM table2 WHERE date > '2023-01-01'); `` - Index columns used in set operations (e.g.,customer_id`).

Data Quality

  • Check for NULLs: Set operations treat NULL as a distinct value. Use COALESCE if needed.
    sql SELECT COALESCE(customer_id, -1) FROM table1 EXCEPT SELECT COALESCE(customer_id, -1) FROM table2;
  • Validate column counts: Ensure both queries return the same number of columns (or use NULL placeholders).
    sql -- If one query has 3 columns and the other has 2, pad with NULL SELECT col1, col2, col3 FROM table1 UNION ALL SELECT col1, col2, NULL FROM table2;

Readability

  • Use CTEs (Common Table Expressions) for complex set operations: sql WITH online_customers AS (
    SELECT customer_id FROM online_sales ), instore_customers AS (
    SELECT customer_id FROM instore_sales ) SELECT * FROM online_customers EXCEPT SELECT * FROM instore_customers;
  • Add comments explaining why you’re using a set operation (e.g., # Deduplicate customers across channels).

Database-Specific Quirks

Database EXCEPT Equivalent Notes
PostgreSQL EXCEPT Also supports EXCEPT ALL (keeps duplicates).
SQL Server EXCEPT
Oracle MINUS
MySQL ❌ Not supported Use LEFT JOIN ... WHERE NULL instead.
BigQuery EXCEPT DISTINCT


5. ⚠️ Common Mistakes & Traps

Mistake Symptom Fix/Prevention
Using UNION when you need UNION ALL Query is slow; duplicates are removed when you need them. Use UNION ALL unless deduplication is required.
Forgetting column order Error: ERROR: each UNION query must have the same number of columns. Ensure both queries return columns in the same order.
Assuming INTERSECT is supported Error: ERROR: syntax error at or near "INTERSECT". MySQL doesn’t support INTERSECT; use INNER JOIN or EXISTS.
Ignoring NULL behavior NULL values are treated as distinct, leading to unexpected results. Use COALESCE to standardize NULL values.
Not filtering early Query runs slowly because it processes all rows before filtering. Apply WHERE clauses before set operations.


6. ? Exam/Certification Focus


Typical Question Patterns

  1. UNION vs. UNION ALL:
  2. "Which operator returns duplicates?"UNION ALL.
  3. "Which operator is faster?"UNION ALL (no deduplication).

  4. INTERSECT vs. EXCEPT:

  5. "Which returns rows present in both queries?"INTERSECT.
  6. "Which returns rows only in the first query?"EXCEPT.

  7. Database-Specific Tricks:

  8. "In Oracle, what is the equivalent of EXCEPT?"MINUS.
  9. "How do you simulate INTERSECT in MySQL?" → Use INNER JOIN or EXISTS.

  10. Performance Scenarios:

  11. "You need to combine two large tables without deduplication. Which operator?"UNION ALL.
  12. "You need to find records in Table A not in Table B. Which operator?"EXCEPT.

Key Trap Distinctions

Concept Trap Correct Approach
UNION vs. UNION ALL Assuming UNION is always better. Use UNION ALL unless you need deduplication.
EXCEPT vs. NOT IN Using NOT IN with NULL values (returns no rows). Use EXCEPT or NOT EXISTS instead.
Column Order Assuming column names matter (they don’t; only order and data type do). Ensure columns are in the same order.


7. ? Hands-On Challenge


Challenge:

You have two tables: - employees (columns: employee_id, name, department) - managers (columns: employee_id, name, team)

Write a query to: 1. List all unique employees (deduplicated) who are either in employees or managers.
2. Find employees who are not managers.

Solution:


-- 1. All unique employees
SELECT employee_id, name FROM employees
UNION
SELECT employee_id, name FROM managers;

-- 2. Employees who are not managers
SELECT employee_id, name FROM employees
EXCEPT
SELECT employee_id, name FROM managers;

Why it works:
- UNION deduplicates employees who appear in both tables.
- EXCEPT filters out employees who are also managers.


8. ? Rapid-Reference Crib Sheet

Operator Purpose Example Performance Notes
UNION Combine, deduplicate SELECT a FROM t1 UNION SELECT a FROM t2 Slow Removes duplicates.
UNION ALL Combine, keep duplicates SELECT a FROM t1 UNION ALL SELECT a FROM t2 Fast No deduplication.
INTERSECT Rows in both queries SELECT a FROM t1 INTERSECT SELECT a FROM t2 Medium Not in MySQL.
EXCEPT Rows in first query only SELECT a FROM t1 EXCEPT SELECT a FROM t2 Medium MINUS in Oracle.
EXCEPT ALL Rows in first query, keep duplicates SELECT a FROM t1 EXCEPT ALL SELECT a FROM t2 Medium PostgreSQL only.

⚠️ Exam Traps:
- UNION vs. UNION ALL: Always ask, "Do I need deduplication?" - INTERSECT in MySQL: Doesn’t exist; use INNER JOIN or EXISTS.
- NULL behavior: NULLNULL in set operations.
- Column order: Must match in both queries.


9. ? Where to Go Next

  1. PostgreSQL Docs: Set Operations
  2. SQLZoo Tutorial: UNION, INTERSECT, EXCEPT
  3. LeetCode Problems:
  4. Combine Two Tables
  5. Employees Earning More Than Their Managers
  6. Book: SQL for Data Analysis by O’Reilly (Chapter 5: Combining Data).

Final Tip:
Set operations are your Swiss Army knife for comparing datasets. Master them, and you’ll write cleaner, faster SQL than 90% of analysts. ?



ADVERTISEMENT