Fatskills
Practice. Master. Repeat.
Study Guide: SQL for Data Analysis - Date/Time Manipulation, Zero-Fluff Study Guide
Source: https://www.fatskills.com/data-science/chapter/tech-sql-for-data-analysis-datetime-manipulation-zero-fluff-study-guide

SQL for Data Analysis - Date/Time Manipulation, Zero-Fluff Study Guide

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~8 min read

SQL for Data Analysis: Date/Time Manipulation – Zero-Fluff Study Guide

(DATE_PART, DATE_TRUNC, TO_CHAR, Intervals)


1. What This Is & Why It Matters

You’re analyzing sales data, and your boss asks: “Show me revenue by hour for the last 30 days, but group it by day of the week.”

Or: “Calculate the average time between a customer’s first and second purchase.”

Or: “Flag all transactions that happened outside business hours (9 AM–5 PM).”

If you don’t know how to extract, truncate, format, and compare dates, you’ll waste hours writing messy workarounds—or worse, deliver wrong results.

Why this matters in production: - Time-based aggregations (daily/weekly/monthly reports) break if you can’t group dates correctly. - Performance tanks if you use string functions (SUBSTRING, LIKE) instead of native date functions. - Bugs creep in when comparing timestamps across time zones or daylight saving changes. - ETL pipelines fail if you can’t handle intervals (e.g., “last 7 days” vs. “last calendar week”).

Superpower you gain: You can slice, dice, and analyze time data at scale—without manual Excel work or brittle Python scripts.


2. Core Concepts & Components

1. DATE_PART() (or EXTRACT() in some SQL dialects)

  • Definition: Extracts a specific part (year, month, day, hour, etc.) from a timestamp.
  • Production insight: If you hardcode SUBSTRING('2023-10-15', 6, 2) to get the month, your query breaks when the format changes. DATE_PART is format-agnostic.

2. DATE_TRUNC()

  • Definition: Truncates a timestamp to a specified precision (e.g., day, hour, month).
  • Production insight: Critical for time-based aggregations (e.g., “daily active users”). Without it, you’ll double-count or miss data.

3. TO_CHAR()

  • Definition: Converts a timestamp to a formatted string (e.g., 'YYYY-MM-DD').
  • Production insight: Useful for reporting (e.g., “Show dates as ‘Mon 15’”) but avoid in WHERE clauses—it kills index usage.

4. Intervals (INTERVAL, +, -)

  • Definition: Represents a duration (e.g., INTERVAL '1 day') or performs date arithmetic.
  • Production insight: Essential for time comparisons (e.g., “orders in the last 30 days”). Hardcoding BETWEEN '2023-10-01' AND '2023-10-31' is fragile—use intervals instead.

5. Time Zones (AT TIME ZONE)

  • Definition: Converts timestamps between time zones.
  • Production insight: If your database stores UTC but your report needs local time, always convert at query time—never store local time.

6. AGE() (PostgreSQL) / DATEDIFF() (SQL Server, MySQL)

  • Definition: Calculates the difference between two timestamps.
  • Production insight: AGE(end_time, start_time) gives a human-readable interval (e.g., “3 days 2 hours”), while DATEDIFF(day, ...) gives a raw count.

7. GENERATE_SERIES() (PostgreSQL)

  • Definition: Generates a series of timestamps (e.g., every hour in a day).
  • Production insight: Useful for filling gaps in time-series data (e.g., “Show 0 for hours with no sales”).

8. DATEADD() (SQL Server) / INTERVAL (PostgreSQL)

  • Definition: Adds/subtracts time from a timestamp.
  • Production insight: DATEADD(day, 7, GETDATE()) is safer than GETDATE() + 7 (which may not work in all SQL dialects).

3. Step-by-Step Hands-On Section

Prerequisites

  • A SQL environment (PostgreSQL, Redshift, BigQuery, or Snowflake).
  • A table with timestamps (e.g., orders with order_time).

Task: Analyze Hourly Sales by Day of Week

Goal: Show total sales per hour, grouped by day of week (e.g., “Monday 10 AM: $500”).

Step 1: Create a Sample Table (if you don’t have one)

CREATE TABLE orders (
    order_id INT,
    order_time TIMESTAMP,
    amount DECIMAL(10, 2)
);

INSERT INTO orders VALUES
    (1, '2023-10-15 10:30:00', 100.00),
    (2, '2023-10-15 14:15:00', 200.00),
    (3, '2023-10-16 10:00:00', 150.00),
    (4, '2023-10-16 23:45:00', 300.00),
    (5, '2023-10-22 10:30:00', 50.00); -- Next Sunday

Step 2: Extract Hour and Day of Week

SELECT
    DATE_PART('hour', order_time) AS hour_of_day,
    TO_CHAR(order_time, 'Day') AS day_of_week, -- 'Monday', 'Tuesday', etc.
    SUM(amount) AS total_sales
FROM orders
GROUP BY
    DATE_PART('hour', order_time),
    TO_CHAR(order_time, 'Day')
ORDER BY
    day_of_week,
    hour_of_day;

Output:

hour_of_day | day_of_week | total_sales
------------+-------------+------------
10          | Sunday      | 50.00
10          | Monday      | 250.00
14          | Monday      | 200.00
23          | Monday      | 300.00

Step 3: Truncate to Hour for Clean Grouping

SELECT
    DATE_TRUNC('hour', order_time) AS hour_truncated,
    SUM(amount) AS total_sales
FROM orders
GROUP BY hour_truncated
ORDER BY hour_truncated;

Output:

hour_truncated       | total_sales
---------------------+------------
2023-10-15 10:00:00  | 100.00
2023-10-15 14:00:00  | 200.00
2023-10-16 10:00:00  | 150.00
2023-10-16 23:00:00  | 300.00
2023-10-22 10:00:00  | 50.00

Step 4: Filter for Business Hours (9 AM–5 PM)

SELECT *
FROM orders
WHERE
    DATE_PART('hour', order_time) BETWEEN 9 AND 17;

Output:

order_id | order_time          | amount
---------+---------------------+--------
1        | 2023-10-15 10:30:00 | 100.00
2        | 2023-10-15 14:15:00 | 200.00
3        | 2023-10-16 10:00:00 | 150.00
5        | 2023-10-22 10:30:00 | 50.00

Step 5: Calculate Time Between Orders

WITH first_orders AS (
    SELECT
        customer_id,
        MIN(order_time) AS first_order_time
    FROM orders
    GROUP BY customer_id
),
second_orders AS (
    SELECT
        o.customer_id,
        o.order_time AS second_order_time
    FROM orders o
    JOIN first_orders f ON o.customer_id = f.customer_id
    WHERE o.order_time > f.first_order_time
    ORDER BY o.order_time
    LIMIT 1
)
SELECT
    f.customer_id,
    f.first_order_time,
    s.second_order_time,
    AGE(s.second_order_time, f.first_order_time) AS time_between_orders
FROM first_orders f
JOIN second_orders s ON f.customer_id = s.customer_id;

Output:

customer_id | first_order_time    | second_order_time   | time_between_orders
------------+---------------------+---------------------+---------------------
1           | 2023-10-15 10:30:00 | 2023-10-16 10:00:00 | 23:30:00

4.-Production-Ready Best Practices

Performance

  • ? Use DATE_TRUNC for grouping – It’s faster than DATE_PART + string manipulation.
  • ? Avoid TO_CHAR in WHERE clauses – It prevents index usage. Filter on raw timestamps instead.
  • ? Use INTERVAL for date math – Hardcoding + 7 or BETWEEN '2023-01-01' AND '2023-01-31' is fragile.

Time Zones

  • ? Store timestamps in UTC – Convert to local time at query time.
  • ? Use AT TIME ZONE for conversions – Example: sql SELECT order_time AT TIME ZONE 'America/New_York' AS local_time FROM orders;

Reporting

  • ? Format dates consistently – Use TO_CHAR(order_time, 'Mon DD, YYYY') for readability.
  • ? Fill gaps in time-series data – Use GENERATE_SERIES to show 0 for missing hours/days.

ETL Pipelines

  • ? Use DATE_TRUNC for partitioning – Example: Partition a table by DATE_TRUNC('day', order_time).
  • ? Handle daylight saving time – Test queries around DST transitions (e.g., 2023-03-12 02:00:00).

5. Common Mistakes & Traps

Mistake Symptom Fix/Prevention
Using SUBSTRING to extract dates Query breaks if format changes (e.g., 'MM/DD/YYYY'-'YYYY-MM-DD'). Use DATE_PART or EXTRACT.
Filtering with TO_CHAR Slow queries (no index usage). Filter on raw timestamps, then format in SELECT.
Hardcoding date ranges Query fails after the hardcoded date. Use CURRENT_DATE - INTERVAL '30 days' or BETWEEN NOW() - INTERVAL '1 month' AND NOW().
Ignoring time zones Reports show wrong times for global users. Store in UTC, convert at query time.
Using DATEDIFF for intervals Returns raw counts (e.g., DATEDIFF(day, ...) = 1 for 23 hours). Use AGE() (PostgreSQL) or TIMESTAMPDIFF (MySQL) for human-readable intervals.

6.-Exam/Certification Focus

Typical Question Patterns

  1. Extracting parts of a date:
  2. “How do you get the month from a timestamp?”

    • ? SUBSTRING(order_time, 6, 2) (fragile)
    • ? DATE_PART('month', order_time) or EXTRACT(MONTH FROM order_time)
  3. Grouping by time periods:

  4. “Show sales by week.”

    • ? GROUP BY DATE_PART('week', order_time) (weeks may not align with calendar weeks)
    • ? GROUP BY DATE_TRUNC('week', order_time)
  5. Date arithmetic:

  6. “Find orders from the last 7 days.”

    • ? WHERE order_time > '2023-10-01' (hardcoded)
    • ? WHERE order_time > CURRENT_DATE - INTERVAL '7 days'
  7. Time zone conversions:

  8. “Convert a UTC timestamp to Pacific Time.”
    • ? order_time AT TIME ZONE 'UTC' AT TIME ZONE 'America/Los_Angeles'

Key Trap Distinctions

  • DATE_PART vs. DATE_TRUNC:
  • DATE_PART('month', '2023-10-15')-10 (extracts the month number).
  • DATE_TRUNC('month', '2023-10-15')-2023-10-01 00:00:00 (truncates to the start of the month).
  • AGE() vs. DATEDIFF:
  • AGE('2023-10-16', '2023-10-15')-1 day (human-readable).
  • DATEDIFF(day, '2023-10-15', '2023-10-16')-1 (raw count).

7.-Hands-On Challenge

Challenge: Write a query to show average sales per hour of the day, but only for weekdays (Monday–Friday). Include hours with zero sales.

Solution:

WITH hours AS (
    SELECT generate_series(0, 23) AS hour
),
weekday_sales AS (
    SELECT
        DATE_PART('hour', order_time) AS hour,
        SUM(amount) AS total_sales,
        COUNT(*) AS order_count
    FROM orders
    WHERE DATE_PART('dow', order_time) BETWEEN 1 AND 5 -- Monday (1) to Friday (5)
    GROUP BY DATE_PART('hour', order_time)
)
SELECT
    h.hour,
    COALESCE(AVG(s.total_sales), 0) AS avg_sales,
    COALESCE(AVG(s.order_count), 0) AS avg_orders
FROM hours h
LEFT JOIN weekday_sales s ON h.hour = s.hour
GROUP BY h.hour
ORDER BY h.hour;

Why it works: - generate_series(0, 23) creates all hours in a day. - DATE_PART('dow', ...) filters for weekdays (1=Monday, 5=Friday). - LEFT JOIN ensures hours with zero sales are included.


8.-Rapid-Reference Crib Sheet

Function Example Use Case
DATE_PART DATE_PART('hour', order_time) Extract hour, day, month, etc.
DATE_TRUNC DATE_TRUNC('day', order_time) Group by day/week/month.
TO_CHAR TO_CHAR(order_time, 'Mon DD, YYYY') Format dates for reports.
INTERVAL order_time + INTERVAL '1 day' Date arithmetic.
AGE AGE(end_time, start_time) Human-readable time differences.
AT TIME ZONE order_time AT TIME ZONE 'America/New_York' Convert time zones.
GENERATE_SERIES generate_series('2023-01-01', '2023-01-31', '1 day') Fill gaps in time-series data.
EXTRACT (PostgreSQL) EXTRACT(HOUR FROM order_time) Alternative to DATE_PART.
DATEDIFF (SQL Server) DATEDIFF(day, start_time, end_time) Raw count of time differences.
TO_CHAR in WHERE WHERE TO_CHAR(order_time, 'YYYY-MM-DD') = '2023-10-15' Avoid – kills index usage.
Hardcoded dates WHERE order_time BETWEEN '2023-10-01' AND '2023-10-31' Fragile – use INTERVAL instead.

9.-Where to Go Next

  1. PostgreSQL Date/Time Functions – Official docs (most comprehensive).
  2. BigQuery Date Functions – If you use BigQuery.
  3. SQLZoo Date Exercises – Interactive practice.
  4. SQL for Data Analysis (O’Reilly) – Chapter 5 covers time-series analysis in depth.