Data Science
Random


Click random to get a fresh chapter.

TECH **SQL for Data Analysis: Year-over-Year (YoY) & Month-over-Month (MoM) Comparisons**




SQL for Data Analysis: Year-over-Year (YoY) & Month-over-Month (MoM) Comparisons

A Hyper-Practical, Zero-Fluff Study Guide


1. What This Is & Why It Matters

What it is:
Year-over-Year (YoY) and Month-over-Month (MoM) comparisons are SQL techniques to measure growth (or decline) by comparing metrics (revenue, users, churn, etc.) against the same period in the past. Think of it like a time machine for your data—you’re asking, "How did we do this month compared to last month?" or "How does Q1 2024 compare to Q1 2023?"

Why it matters in production:
- Executives live by these numbers. If you can’t answer "Why did revenue drop 15% YoY?" in a board meeting, you’re toast.
- Marketing teams use MoM to track campaign performance. Did that Super Bowl ad actually work? - Finance teams rely on YoY for forecasting. If you miscalculate YoY growth, budgets get slashed (or wasted).
- E-commerce sites use MoM to detect anomalies. A sudden 30% MoM drop in sales? Probably a broken checkout button.

Real-world scenario:
You’re a data analyst at an e-commerce company. The CEO asks: "Our revenue was $1.2M in March 2024. How does that compare to March 2023? And how did we grow from February 2024?" If you can’t answer this instantly with SQL, you’re not just slow—you’re invisible.


2. Core Concepts & Components


1. YoY (Year-over-Year) Comparison

  • Definition: Compares a metric (e.g., revenue) in the current period to the same period one year ago.
  • Production insight: YoY smooths out seasonality (e.g., holiday spikes). If you only look at MoM, you might panic in January when sales drop post-Christmas.

2. MoM (Month-over-Month) Comparison

  • Definition: Compares a metric in the current month to the previous month.
  • Production insight: MoM is great for short-term trends (e.g., "Did our new feature increase signups?"), but it’s noisy—watch for outliers.

3. LAG() Window Function

  • Definition: A SQL function that lets you peek at the previous row in a result set (e.g., "Show me this month’s revenue and last month’s revenue side by side").
  • Production insight: LAG() is 10x faster than self-joins for time comparisons. If your query takes 30 seconds, rewrite it with LAG().

4. Self-Join

  • Definition: Joining a table to itself to compare rows (e.g., "Join March 2024 sales to March 2023 sales").
  • Production insight: Self-joins are easy to mess up (duplicate rows, wrong join conditions). Use LAG() when possible.

5. Date Truncation (DATE_TRUNC)

  • Definition: Rounds a timestamp to a specified precision (e.g., DATE_TRUNC('month', order_date) turns 2024-03-15 into 2024-03-01).
  • Production insight: Without truncation, your YoY/MoM comparisons will fail silently (e.g., comparing 2024-03-15 to 2023-03-10 instead of 2023-03-01).

6. Growth Rate Calculation

  • Definition: (current_value - previous_value) / previous_value * 100 (e.g., "Revenue grew 20% YoY").
  • Production insight: Always round to 1 decimal place in reports. Executives don’t care about 19.8765%—they want 20%.

7. Common Table Expression (CTE)

  • Definition: A temporary result set (defined with WITH) that makes complex queries readable.
  • Production insight: CTEs are not just for readability—they can improve performance by breaking queries into logical chunks.

8. NULL Handling (COALESCE, IFNULL)

  • Definition: Functions to replace NULL values (e.g., COALESCE(revenue, 0)).
  • Production insight: If you don’t handle NULLs, your growth rates will explode (e.g., (100 - NULL) / NULL = NULL).


3. Step-by-Step Hands-On Section


Prerequisites

  • A SQL database (PostgreSQL, BigQuery, Snowflake, etc.).
  • A table with at least 2 years of data (e.g., orders with order_date and revenue).
  • Basic SQL knowledge (SELECT, GROUP BY, JOIN).

Task: Calculate YoY & MoM Revenue Growth

We’ll use a sample orders table with this schema:


CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
revenue DECIMAL(10,2) );

Step 1: Generate Sample Data (Run This First)

-- Insert 3 years of sample data (2022-2024)
INSERT INTO orders (order_id, customer_id, order_date, revenue)
VALUES
(1, 101, '2022-01-15', 100.00),
(2, 102, '2022-02-20', 150.00),
(3, 103, '2022-03-10', 200.00),
(4, 104, '2023-01-05', 120.00),
(5, 105, '2023-02-18', 180.00),
(6, 106, '2023-03-25', 250.00),
(7, 107, '2024-01-12', 130.00),
(8, 108, '2024-02-28', 200.00),
(9, 109, '2024-03-15', 300.00);

Step 2: Calculate Monthly Revenue (Base Query)

SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS monthly_revenue FROM orders GROUP BY DATE_TRUNC('month', order_date) ORDER BY month;

Expected Output:


month       | monthly_revenue
------------+-----------------
2022-01-01  | 100.00
2022-02-01  | 150.00
2022-03-01  | 200.00
2023-01-01  | 120.00
2023-02-01  | 180.00
2023-03-01  | 250.00
2024-01-01  | 130.00
2024-02-01  | 200.00
2024-03-01  | 300.00

Step 3: Add YoY & MoM Comparisons (Using LAG)

WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date) ) SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
LAG(revenue, 12) OVER (ORDER BY month) AS prev_year_revenue,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month)) /
NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0) * 100, 1) AS mom_growth_pct,
ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY month)) /
NULLIF(LAG(revenue, 12) OVER (ORDER BY month), 0) * 100, 1) AS yoy_growth_pct FROM monthly_revenue ORDER BY month;

Expected Output:


month       | revenue | prev_month_revenue | prev_year_revenue | mom_growth_pct | yoy_growth_pct
------------+---------+--------------------+-------------------+----------------+---------------
2022-01-01  | 100.00  | NULL               | NULL              | NULL           | NULL
2022-02-01  | 150.00  | 100.00             | NULL              | 50.0           | NULL
2022-03-01  | 200.00  | 150.00             | NULL              | 33.3           | NULL
2023-01-01  | 120.00  | 200.00             | 100.00            | -40.0          | 20.0
2023-02-01  | 180.00  | 120.00             | 150.00            | 50.0           | 20.0
2023-03-01  | 250.00  | 180.00             | 200.00            | 38.9           | 25.0
2024-01-01  | 130.00  | 250.00             | 120.00            | -48.0          | 8.3
2024-02-01  | 200.00  | 130.00             | 180.00            | 53.8           | 11.1
2024-03-01  | 300.00  | 200.00             | 250.00            | 50.0           | 20.0

Step 4: Filter for Only Complete Comparisons (Remove NULLs)

WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date) ), growth_calcs AS (
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
LAG(revenue, 12) OVER (ORDER BY month) AS prev_year_revenue,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month)) /
NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0) * 100, 1) AS mom_growth_pct,
ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY month)) /
NULLIF(LAG(revenue, 12) OVER (ORDER BY month), 0) * 100, 1) AS yoy_growth_pct
FROM monthly_revenue ) SELECT
month,
revenue,
mom_growth_pct,
yoy_growth_pct FROM growth_calcs WHERE prev_month_revenue IS NOT NULL AND prev_year_revenue IS NOT NULL ORDER BY month;

Expected Output (No NULLs):


month       | revenue | mom_growth_pct | yoy_growth_pct
------------+---------+----------------+---------------
2023-01-01  | 120.00  | -40.0          | 20.0
2023-02-01  | 180.00  | 50.0           | 20.0
2023-03-01  | 250.00  | 38.9           | 25.0
2024-01-01  | 130.00  | -48.0          | 8.3
2024-02-01  | 200.00  | 53.8           | 11.1
2024-03-01  | 300.00  | 50.0           | 20.0


4. ? Production-Ready Best Practices


Performance

  • Use LAG() instead of self-joins → Faster, cleaner, and less error-prone.
  • Index your date columns → If order_date isn’t indexed, your query will crawl.
  • Pre-aggregate data → If you’re running this daily, materialize the results in a table.

Readability

  • Use CTEs → Break complex logic into named chunks (e.g., monthly_revenue, growth_calcs).
  • Round percentages → Executives don’t care about 19.8765%—they want 20%.
  • Handle NULLs → Use COALESCE or NULLIF to avoid division-by-zero errors.

Maintainability

  • Document assumptions"This query assumes revenue is in USD and order_date is in UTC."
  • Add tests → Write a unit test to verify that mom_growth_pct is calculated correctly.
  • Parameterize dates → If this is in a dashboard, make the date range dynamic.

Observability

  • Log query performance → If it takes >5 seconds, optimize it.
  • Alert on anomalies → If MoM growth drops >30%, trigger an alert.


5. ⚠️ Common Mistakes & Traps

Mistake Symptom Fix/Prevention
Not truncating dates YoY compares 2024-03-15 to 2023-03-10 (wrong day) Always use DATE_TRUNC('month', order_date)
Ignoring NULLs Growth rate shows NULL or Infinity Use NULLIF(previous_value, 0) to avoid division by zero
Using self-joins instead of LAG() Query runs slowly, returns duplicate rows Rewrite with LAG() for better performance
Not filtering incomplete periods Report shows NULL for recent months Filter out rows where prev_year_revenue IS NULL
Misaligning fiscal years YoY compares Q1 2024 to Q1 2023, but your fiscal year starts in April Adjust the LAG(12) offset to match your fiscal calendar


6. ? Exam/Certification Focus


Typical Question Patterns

  1. "Which SQL function is best for comparing this month’s revenue to last month’s?"
  2. JOIN (too slow)
  3. LAG() (correct answer)

  4. "How do you calculate YoY growth for March 2024 vs. March 2023?"

  5. LAG(revenue, 1) (compares to February 2024)
  6. LAG(revenue, 12) (compares to March 2023)

  7. "Why does this query return NULL for growth rates?"
    sql
    SELECT (revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100
    FROM monthly_revenue;

  8. ❌ Missing NULLIF → Division by zero when LAG(revenue) = 0
  9. ✅ Add NULLIF(LAG(revenue), 0)

Key Trap Distinctions

  • LAG(1) vs. LAG(12)LAG(1) = MoM, LAG(12) = YoY.
  • DATE_TRUNC('month') vs. EXTRACT(MONTH FROM date)DATE_TRUNC keeps the full date (e.g., 2024-03-01), EXTRACT just gives 3 (month number).
  • COALESCE vs. NULLIFCOALESCE(revenue, 0) replaces NULL with 0, NULLIF(revenue, 0) replaces 0 with NULL.


7. ? Hands-On Challenge (With Solution)

Challenge:
Write a query that calculates YoY growth for Q1 (Jan-Mar) 2024 vs. Q1 2023, using the orders table from earlier.

Solution:


WITH quarterly_revenue AS (
SELECT
DATE_TRUNC('quarter', order_date) AS quarter,
SUM(revenue) AS revenue
FROM orders
GROUP BY DATE_TRUNC('quarter', order_date) ), growth_calcs AS (
SELECT
quarter,
revenue,
LAG(revenue, 4) OVER (ORDER BY quarter) AS prev_year_revenue,
ROUND((revenue - LAG(revenue, 4) OVER (ORDER BY quarter)) /
NULLIF(LAG(revenue, 4) OVER (ORDER BY quarter), 0) * 100, 1) AS yoy_growth_pct
FROM quarterly_revenue ) SELECT
quarter,
revenue,
yoy_growth_pct FROM growth_calcs WHERE prev_year_revenue IS NOT NULL AND quarter = '2024-01-01'; -- Q1 2024

Why it works:
- DATE_TRUNC('quarter', order_date) groups by quarter.
- LAG(revenue, 4) compares to the same quarter last year (4 quarters = 1 year).
- NULLIF prevents division by zero.


8. ? Rapid-Reference Crib Sheet

Task SQL Snippet
Truncate date to month DATE_TRUNC('month', order_date)
Truncate date to quarter DATE_TRUNC('quarter', order_date)
Get previous month’s value LAG(revenue, 1) OVER (ORDER BY month)
Get previous year’s value LAG(revenue, 12) OVER (ORDER BY month)
Calculate MoM growth (revenue - LAG(revenue, 1)) / NULLIF(LAG(revenue, 1), 0) * 100
Calculate YoY growth `(revenue - LAG(revenue, 12)) / NULLIF(LAG(revenue, 12), 0) *