By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
You’re analyzing customer data from two different systems—one for online sales (ecommerce_sales) and another for in-store purchases (retail_sales). Your boss asks: "Give me a single list of all unique customers who made a purchase in the last 30 days, regardless of channel. Then, show me which customers bought online but never in-store."
ecommerce_sales
retail_sales
This is where UNION, UNION ALL, INTERSECT, and EXCEPT come in. These set operators let you combine, compare, or filter results from multiple queries—something JOIN can’t do efficiently (or at all).
JOIN
Why this matters in production:- Avoid duplicate work: Instead of exporting data to Python/Pandas, you can merge datasets directly in SQL.- Performance: Set operations are optimized in most databases (PostgreSQL, SQL Server, BigQuery, etc.).- Data quality checks: Quickly find mismatches between datasets (e.g., "Which products exist in our catalog but never sold?").- ETL pipelines: Clean and deduplicate data before loading into a data warehouse.
What breaks if you ignore this?- You’ll write messy, slow subqueries or resort to manual Excel merges.- Your reports will double-count records (if you use UNION ALL when you need UNION).- You’ll miss critical data gaps (e.g., customers who churned in one system but not another).
UNION ALL
UNION
sql SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2;
sql SELECT column1, column2 FROM table1 UNION ALL SELECT column1, column2 FROM table2;
INNER JOIN
EXISTS
sql SELECT column1 FROM table1 INTERSECT SELECT column1 FROM table2;
EXCEPT
MINUS
sql SELECT column1 FROM table1 EXCEPT SELECT column1 FROM table2;
INT
BIGINT
VARCHAR
INTERSECT
online_sales
instore_sales
customer_id
product_id
purchase_date
You need to: 1. List all unique customers who made a purchase in the last 30 days (deduplicated).2. Find customers who bought online but never in-store.3. Find customers who bought in both channels.
-- Create tables CREATE TABLE online_sales ( customer_id INT, product_id INT, purchase_date DATE ); CREATE TABLE instore_sales ( customer_id INT, product_id INT, purchase_date DATE ); -- Insert sample data INSERT INTO online_sales VALUES (1, 101, '2023-10-01'), (2, 102, '2023-10-02'), (3, 103, '2023-10-03'), (1, 104, '2023-10-04'); -- Customer 1 bought twice online INSERT INTO instore_sales VALUES (1, 201, '2023-10-01'), -- Customer 1 also bought in-store (2, 202, '2023-10-05'), (4, 203, '2023-10-06'); -- Customer 4 only bought in-store
SELECT customer_id FROM online_sales WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days' UNION SELECT customer_id FROM instore_sales WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days' ORDER BY customer_id;
Output:
customer_id ----------- 1 2 3 4
Why this works: UNION removes duplicates (e.g., customer 1 appears in both tables but only once in the result).
SELECT customer_id FROM online_sales WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days' UNION ALL SELECT customer_id FROM instore_sales WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days' ORDER BY customer_id;
customer_id ----------- 1 1 2 2 3 4
Why this works: UNION ALL keeps duplicates (customer 1 appears twice because they bought in both channels).
SELECT customer_id FROM online_sales INTERSECT SELECT customer_id FROM instore_sales;
customer_id ----------- 1 2
Why this works: Only customers present in both tables are returned.
SELECT customer_id FROM online_sales EXCEPT SELECT customer_id FROM instore_sales;
customer_id ----------- 3
Why this works: Returns customers from the first query (online_sales) that don’t exist in the second (instore_sales).
Run this to confirm your results:
-- Count unique customers per channel SELECT COUNT(DISTINCT customer_id) AS online_customers, COUNT(DISTINCT customer_id) AS instore_customers FROM online_sales FULL OUTER JOIN instore_sales USING (customer_id); -- Compare with UNION/INTERSECT/EXCEPT results
Expected: The counts should align with your earlier queries.
WHERE
-- ✅ Fast: Filter before UNION (SELECT * FROM table1 WHERE date > '2023-01-01') UNION ALL (SELECT * FROM table2 WHERE date > '2023-01-01'); `` - Index columns used in set operations (e.g.,customer_id`).
`` - Index columns used in set operations (e.g.,
NULL
COALESCE
sql SELECT COALESCE(customer_id, -1) FROM table1 EXCEPT SELECT COALESCE(customer_id, -1) FROM table2;
sql -- If one query has 3 columns and the other has 2, pad with NULL SELECT col1, col2, col3 FROM table1 UNION ALL SELECT col1, col2, NULL FROM table2;
sql WITH online_customers AS ( SELECT customer_id FROM online_sales ), instore_customers AS ( SELECT customer_id FROM instore_sales ) SELECT * FROM online_customers EXCEPT SELECT * FROM instore_customers;
# Deduplicate customers across channels
EXCEPT ALL
LEFT JOIN ... WHERE NULL
EXCEPT DISTINCT
ERROR: each UNION query must have the same number of columns
ERROR: syntax error at or near "INTERSECT"
"Which operator is faster?" → UNION ALL (no deduplication).
INTERSECT vs. EXCEPT:
"Which returns rows only in the first query?" → EXCEPT.
Database-Specific Tricks:
"How do you simulate INTERSECT in MySQL?" → Use INNER JOIN or EXISTS.
Performance Scenarios:
NOT IN
NOT EXISTS
You have two tables: - employees (columns: employee_id, name, department) - managers (columns: employee_id, name, team)
employees
employee_id
name
department
managers
team
Write a query to: 1. List all unique employees (deduplicated) who are either in employees or managers.2. Find employees who are not managers.
Solution:
-- 1. All unique employees SELECT employee_id, name FROM employees UNION SELECT employee_id, name FROM managers; -- 2. Employees who are not managers SELECT employee_id, name FROM employees EXCEPT SELECT employee_id, name FROM managers;
Why it works:- UNION deduplicates employees who appear in both tables.- EXCEPT filters out employees who are also managers.
SELECT a FROM t1 UNION SELECT a FROM t2
SELECT a FROM t1 UNION ALL SELECT a FROM t2
SELECT a FROM t1 INTERSECT SELECT a FROM t2
SELECT a FROM t1 EXCEPT SELECT a FROM t2
SELECT a FROM t1 EXCEPT ALL SELECT a FROM t2
⚠️ Exam Traps:- UNION vs. UNION ALL: Always ask, "Do I need deduplication?" - INTERSECT in MySQL: Doesn’t exist; use INNER JOIN or EXISTS.- NULL behavior: NULL ≠ NULL in set operations.- Column order: Must match in both queries.
Final Tip:Set operations are your Swiss Army knife for comparing datasets. Master them, and you’ll write cleaner, faster SQL than 90% of analysts. ?
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.