SQL Window Functions for Time-Based Event Tracking
Tracking user behavior or system events over time is a core requirement in analytics, monitoring, and debugging. Fortunately, SQL window functions are powerful tools that allow you to perform time-based calculations without writing overly complex subqueries. In this article, we’ll explore how to use SQL window functions to compute rolling averages, detect event gaps, and calculate session durations from event logs—all in a developer-friendly manner.
1. Understanding the Dataset and the Problem Space
To illustrate the power of window functions, we’ll work with a sample event log table that simulates user interactions on a website. This dataset includes multiple events per user with timestamps that allow us to analyze their behavior over time:
CREATE TABLE user_events (
user_id INT,
event_time TIMESTAMP,
event_type TEXT
);
INSERT INTO user_events (user_id, event_time, event_type) VALUES
(1, '2023-01-01 10:00:00', 'page_view'),
(1, '2023-01-01 10:05:00', 'click'),
(1, '2023-01-01 10:40:00', 'page_view'),
(2, '2023-01-01 11:00:00', 'page_view'),
(2, '2023-01-01 11:15:00', 'click');
Each event represents a touchpoint. Our tasks include calculating event frequency over time, identifying gaps in activity, and grouping sessions using time-based thresholds.
2. Computing Rolling Averages of Event Frequency
Let’s say we want to calculate the average time between a user’s last 3 events. This helps measure how often a user is interacting with the system.
SELECT
user_id,
event_time,
event_type,
EXTRACT(EPOCH FROM event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)) AS time_since_last_event,
AVG(EXTRACT(EPOCH FROM event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)))
OVER (PARTITION BY user_id ORDER BY event_time ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_time_between_events
FROM user_events;
Why it works: We use LAG to calculate the time since the previous event for a user, and AVG with a sliding window of 3 events to calculate a rolling average. This allows dynamic insights into shifting interaction patterns.
3. Detecting Event Gaps with LEAD and Conditional Logic
To find if there’s a suspicious inactivity gap—for example, anything longer than 30 minutes—we can apply LEAD and a CASE statement.
SELECT
user_id,
event_time,
LEAD(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS next_event_time,
CASE
WHEN LEAD(event_time) OVER (PARTITION BY user_id ORDER BY event_time) - event_time > INTERVAL '30 minutes'
THEN 'gap_detected'
ELSE NULL
END AS event_gap_flag
FROM user_events;
Why it works: LEAD gives us the timestamp of the next event. Comparing it to the current time highlights where inactivity periods exceed our threshold.
4. Building Time-Based User Sessions
Event logs often need to be grouped into sessions—contiguous periods of activity that are separated by inactivity (e.g., more than 30 minutes). Let’s generate session IDs using a combination of LAG and SUM.
WITH event_deltas AS (
SELECT
user_id,
event_time,
CASE
WHEN event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) > INTERVAL '30 minutes'
OR LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
THEN 1 ELSE 0
END AS session_boundary
FROM user_events
),
sessionized AS (
SELECT
*,
SUM(session_boundary) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM event_deltas
)
SELECT user_id, event_time, session_id
FROM sessionized;
Why it works: Every time a 30-minute gap is detected, we flag a session boundary. By summing these boundaries, we generate incrementing session IDs per user.
5. Advanced: Duration of Sessions Using FIRST_VALUE and LAST_VALUE
Now that we have session IDs, let’s calculate the duration of each one by identifying start and end times per session.
WITH event_deltas AS (...),
sessionized AS (...),
session_bounds AS (
SELECT
user_id,
session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
EXTRACT(EPOCH FROM MAX(event_time) - MIN(event_time)) AS duration_seconds
FROM sessionized
GROUP BY user_id, session_id
)
SELECT * FROM session_bounds;
Why it works: By grouping on session_id, we easily find the earliest and latest points of a session, helping calculate the time spent in each session accurately.
Performance and Optimization Tips
Window functions are typically efficient, but performance can degrade on massive datasets. Indexing user_id and event_time fields can have a significant impact. Limit your window frame when possible (e.g., use ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), and avoid unnecessary complex subqueries.
Conclusion
SQL window functions open a world of possibilities when analyzing time-based event data. From rolling averages to sessionization, these patterns scale well and are supported in most modern databases like PostgreSQL, SQL Server, and Snowflake. Mastering these techniques lets you turn raw logs into meaningful insights with less code and greater performance.
Useful links:


