susinlee 님의 블로그
퍼널 분석 - SQL 쿼리로 구현하기 본문


유저별로 행동 로그를 담은 테이블을 가지고 각 funnel_step에서의 유저 수를 SQL 쿼리로 구현해볼 것이다.
그 전에 각 퍼널 유형별로 집계 방식의 차이를 짚고 넘어가자.
개방형 퍼널 vs 폐쇄형 퍼널
Step A | Step B | Step C | |
User 1 | X | X | X |
User 2 | X | X | |
User 3 | X | X | |
User 4 | X |
- User 1 completed all 3 steps
- User 2 entered the funnel at step B, and also completed step C
- User 3 entered the funnel at step A, skipped step B, completed step C
- User 4 enters the funnel at step C
개방형 퍼널 (open funnel)
- 사용자가 어떤 단계에서든지 퍼널 진입 가능
Step A | Step B | Step C | |
User 1 | X (O) | X (O) | X (O) |
User 2 | X (O) | X (O) | |
User 3 | X (O) | X | |
User 4 | X (O) | ||
TOTAL | 2 | 2 | 3 |
폐쇄형 퍼널(closed funnel)
- 시작이 정해져 있음
Step A | Step B | Step C | |
User 1 | X (O) | X (O) | X (O) |
User 2 | X | X | |
User 3 | X (O) | X | |
User 4 | X | ||
TOTAL | 2 | 1 | 1 |
폐쇄형 퍼널을 구현할 것인데, 다음 조건을 고려 해야한다.
- 시작이 정해져 있고
- 시퀀스가 중요하다. 즉, 링크가 존재해야 한다( 1 → 2, 2 → 3 등등).
- 링크만 존재한다면 사이사이 다른 행동을 하는 것은 관계없다.
SQL로 폐쇄형 퍼널 구현하기
1. 유저별로 윈도우를 설정한다.
SELECT
*,
RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
- 1번 step이 발생할 때마다 새로운 윈도우를 생성해준다.
2. 윈도우의 시작과 끝나는 시점을 구해준다.
SELECT
user_id,
window_num,
event_time AS window_st_ts,
LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
FROM
(
SELECT
*,
RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
) sub

3. 만들어진 테이블(window_)을 기존 테이블(event_logs)과 병합한다.
WITH window_ AS (
SELECT
user_id,
window_num,
event_time AS window_st_ts,
LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
FROM
(
SELECT
*,
RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
) sub
)
SELECT l.user_id, window_num, event_time, funnel_step,
LEAD(funnel_step) OVER(PARTITION BY user_id, window_num ORDER BY event_time) AS next_funnel_step
FROM event_logs l
JOIN window_ w
ON l.user_id = w.user_id
AND event_time >= window_st_ts AND event_time < window_end_ts
- user_id가 같으면서, event_time이 윈도우의 시작(포함)과 끝 사이에 위치하도록 조건을 걸어준다
- 병합한 후 유저, 윈도우 별로 다음 funnel_step을 나타내는 next_funnel_step 컬럼을 생성해준다. (LEAD 함수)

위의 테이블에서 funnel_step이 HEAD 가 되고, next_funnel_step이 TAIL 이 된다.
4. head 와 tail 을 앞서 살펴보았던 조건에 맞게 구하고, union을 수행한 뒤 최종 steps를 구한다.
WITH window_ AS (
SELECT user_id,
window_num,
event_time AS window_st_ts,
LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
FROM
(
SELECT
*,
RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
) sub
), funnel_log AS (
SELECT l.user_id, window_num, event_time, funnel_step,
LEAD(funnel_step) OVER(PARTITION BY l.user_id, window_num ORDER BY event_time) AS next_funnel_step
FROM event_logs l
JOIN window_ w
ON l.user_id = w.user_id
AND event_time >= window_st_ts AND event_time < window_end_ts
)
SELECT
user_id, window_num,
REPLACE(GROUP_CONCAT(DISTINCT step), ',', '') AS steps
FROM
(
-- head
SELECT
user_id, window_num, funnel_step AS step
FROM funnel_log
WHERE
next_funnel_step IS NULL
OR (next_funnel_step - funnel_step) = 1
UNION ALL
-- tail
SELECT
user_id, window_num, next_funnel_step AS step
FROM funnel_log
WHERE
next_funnel_step IS NULL
OR (next_funnel_step - funnel_step) = 1
) sub2
GROUP BY user_id, window_num

참고로 MySQL은 GROUP_CONCAT을 수행할 때 내부적으로 정렬을 한다. (steps를 보면 오름차순으로 나열되어있다.)
5. 각 step 별로 유저를 기록한다.
WITH window_ AS (
SELECT user_id,
window_num,
event_time AS window_st_ts,
LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
FROM
(
SELECT
*,
RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
) sub
), funnel_log AS (
SELECT l.user_id, window_num, event_time, funnel_step,
LEAD(funnel_step) OVER(PARTITION BY l.user_id, window_num ORDER BY event_time) AS next_funnel_step
FROM event_logs l
JOIN window_ w
ON l.user_id = w.user_id
AND event_time >= window_st_ts AND event_time < window_end_ts
), funnel_step AS (
SELECT
user_id, window_num,
REPLACE(GROUP_CONCAT(DISTINCT step), ',', '') AS steps
FROM
(
-- head
SELECT
user_id, window_num, funnel_step AS step
FROM funnel_log
WHERE
next_funnel_step IS NULL
OR (next_funnel_step - funnel_step) = 1
UNION ALL
-- tail
SELECT
user_id, window_num, next_funnel_step AS step
FROM funnel_log
WHERE
next_funnel_step IS NULL
OR (next_funnel_step - funnel_step) = 1
) sub2
GROUP BY user_id, window_num
)
SELECT DISTINCT user_id, window_num, steps,
case when steps like '1%' then user_id else null end as step_1,
case when steps like '12%' then user_id else null end as step_12,
case when steps like '123%' then user_id else null end as step_123,
case when steps like '1234%' then user_id else null end as step_1234,
case when steps like '12345%' then user_id else null end as step_12345,
case when steps like '123456%' then user_id else null end as step_123456,
case when steps like '1234567%' then user_id else null end as step_1234567,
case when steps like '12345678%' then user_id else null end as step_12345678,
case when steps like '123456789%' then user_id else null end as step_123456789,
case when steps like '12345678910%' then user_id else null end as step_12345678910
FROM funnel_step

6. step 단계별로 유저 수를 집계해준다.
여기서 step1의 경우 1 → 3 으로 간 유저도 집계해주어야 하기 때문에, 윈도우 테이블에서 고유한 user_id의 수를 집계해준다. (step1이 있어야 윈도우가 생성되기 때문)
WITH window_ AS (
SELECT user_id,
window_num,
event_time AS window_st_ts,
LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
FROM
(
SELECT
*,
RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
) sub
), funnel_log AS (
SELECT l.user_id, window_num, event_time, funnel_step,
LEAD(funnel_step) OVER(PARTITION BY l.user_id, window_num ORDER BY event_time) AS next_funnel_step
FROM event_logs l
JOIN window_ w
ON l.user_id = w.user_id
AND event_time >= window_st_ts AND event_time < window_end_ts
), funnel_step AS (
SELECT
user_id, window_num,
REPLACE(GROUP_CONCAT(DISTINCT step), ',', '') AS steps
FROM
(
-- head
SELECT
user_id, window_num, funnel_step AS step
FROM funnel_log
WHERE
next_funnel_step IS NULL
OR (next_funnel_step - funnel_step) = 1
UNION ALL
-- tail
SELECT
user_id, window_num, next_funnel_step AS step
FROM funnel_log
WHERE
next_funnel_step IS NULL
OR (next_funnel_step - funnel_step) = 1
) sub2
GROUP BY user_id, window_num
)
SELECT
(SELECT count(distinct user_id) FROM window_) as step_1
, count(distinct step_12) as step_12
, count(distinct step_123) as step_123
, count(distinct step_1234) as step_1234
, count(distinct step_12345) as step_12345
, count(distinct step_123456) as step_123456
, count(distinct step_1234567) as step_1234567
, count(distinct step_12345678) as step_12345678
, count(distinct step_123456789) as step_123456789
, count(distinct step_12345678910) as step_12345678910
FROM
(
SELECT DISTINCT user_id, window_num, steps,
case when steps like '12%' then user_id else null end as step_12,
case when steps like '123%' then user_id else null end as step_123,
case when steps like '1234%' then user_id else null end as step_1234,
case when steps like '12345%' then user_id else null end as step_12345,
case when steps like '123456%' then user_id else null end as step_123456,
case when steps like '1234567%' then user_id else null end as step_1234567,
case when steps like '12345678%' then user_id else null end as step_12345678,
case when steps like '123456789%' then user_id else null end as step_123456789,
case when steps like '12345678910%' then user_id else null end as step_12345678910
FROM funnel_step
) sub3

'코드카타 > SQL, Pandas' 카테고리의 다른 글
쿼리 작성 가이드 및 최적화 (0) | 2025.03.26 |
---|---|
SQL로 조합 생성하기 (0) | 2025.03.18 |
코호트 분석 - SQL 쿼리로 구현하기 (0) | 2025.03.16 |
재귀 쿼리를 활용한 다양한 계층적 데이터 및 시계열 데이터 다루기 (0) | 2025.03.15 |
쿼리 엔진 별로 비교해 보는 날짜 시간 함수 (0) | 2025.03.15 |