susinlee 님의 블로그

퍼널 분석 - SQL 쿼리로 구현하기 본문

코드카타/SQL, Pandas

퍼널 분석 - SQL 쿼리로 구현하기

susinlee 2025. 3. 16. 14:03

 

유저별로 행동 로그를 담은 테이블을 가지고 각 funnel_step에서의 유저 수를 SQL 쿼리로 구현해볼 것이다.

 

그 전에 각 퍼널 유형별로 집계 방식의 차이를 짚고 넘어가자.

 

개방형 퍼널 vs 폐쇄형 퍼널

  Step A Step B Step C
User 1 X X X
User 2   X X
User 3 X   X
User 4     X
  • User 1 completed all 3 steps
  • User 2 entered the funnel at step B, and also completed step C
  • User 3 entered the funnel at step A, skipped step B, completed step C
  • User 4 enters the funnel at step C

개방형 퍼널 (open funnel)

  • 사용자가 어떤 단계에서든지 퍼널 진입 가능
  Step A Step B Step C
User 1 X (O) X (O) X (O)
User 2   X (O) X (O)
User 3 X (O)   X
User 4     X (O)
TOTAL 2 2 3

 

폐쇄형 퍼널(closed funnel)

  • 시작이 정해져 있음
  Step A Step B Step C
User 1 X (O) X (O) X (O)
User 2   X X
User 3 X (O)   X
User 4     X
TOTAL 2 1 1

 

 

폐쇄형 퍼널을 구현할 것인데, 다음 조건을 고려 해야한다.

  1. 시작이 정해져 있고
  2. 시퀀스가 중요하다. 즉, 링크가 존재해야 한다( 1 → 2, 2 → 3 등등).
  3. 링크만 존재한다면 사이사이 다른 행동을 하는 것은 관계없다.

 

SQL로 폐쇄형 퍼널 구현하기

1. 유저별로 윈도우를 설정한다.

SELECT 
    *,
    RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
FROM event_logs
WHERE funnel_step = 1
  • 1번 step이 발생할 때마다 새로운 윈도우를 생성해준다.

 

2. 윈도우의 시작과 끝나는 시점을 구해준다.

SELECT 
    user_id,
    window_num,
    event_time AS window_st_ts,
    LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
FROM 
(
    SELECT 
        *,
        RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
    FROM event_logs
    WHERE funnel_step = 1
) sub

 

3. 만들어진 테이블(window_)을 기존 테이블(event_logs)과 병합한다. 

WITH window_ AS (
    SELECT 
        user_id,
        window_num,
        event_time AS window_st_ts,
        LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
    FROM 
(
    SELECT 
        *,
        RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
    FROM event_logs
    WHERE funnel_step = 1
) sub
)
SELECT l.user_id, window_num, event_time, funnel_step,
       LEAD(funnel_step) OVER(PARTITION BY user_id, window_num ORDER BY event_time) AS next_funnel_step 
FROM event_logs l
JOIN window_ w
    ON l.user_id = w.user_id
    AND event_time >= window_st_ts AND event_time < window_end_ts
  • user_id가 같으면서, event_time이 윈도우의 시작(포함)과 끝 사이에 위치하도록 조건을 걸어준다
  • 병합한 후 유저, 윈도우 별로 다음 funnel_step을 나타내는 next_funnel_step 컬럼을 생성해준다. (LEAD 함수)

 

위의 테이블에서 funnel_step이 HEAD 가 되고, next_funnel_step이 TAIL 이 된다.

 

4. head 와 tail 을 앞서 살펴보았던 조건에 맞게 구하고, union을 수행한 뒤 최종 steps를 구한다.

WITH window_ AS (
    SELECT user_id,
           window_num,
           event_time AS window_st_ts,
           LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
    FROM 
    (
        SELECT 
            *,
            RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
        FROM event_logs
        WHERE funnel_step = 1
    ) sub
), funnel_log AS (
    SELECT l.user_id, window_num, event_time, funnel_step,
           LEAD(funnel_step) OVER(PARTITION BY l.user_id, window_num ORDER BY event_time) AS next_funnel_step 
    FROM event_logs l
    JOIN window_ w
        ON l.user_id = w.user_id
        AND event_time >= window_st_ts AND event_time < window_end_ts
)
SELECT
    user_id, window_num,
    REPLACE(GROUP_CONCAT(DISTINCT step), ',', '') AS steps
FROM 
(
    -- head
    SELECT
        user_id, window_num, funnel_step AS step
    FROM funnel_log
    WHERE
        next_funnel_step IS NULL
        OR (next_funnel_step - funnel_step) = 1
UNION ALL
    -- tail
    SELECT
        user_id, window_num, next_funnel_step AS step
    FROM funnel_log
    WHERE 
        next_funnel_step IS NULL
        OR (next_funnel_step - funnel_step) = 1
) sub2
GROUP BY user_id, window_num

 

참고로 MySQL은 GROUP_CONCAT을 수행할 때 내부적으로 정렬을 한다. (steps를 보면 오름차순으로 나열되어있다.)

 

5. 각 step 별로 유저를 기록한다.

WITH window_ AS (
    SELECT user_id,
           window_num,
           event_time AS window_st_ts,
           LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
    FROM 
    (
        SELECT 
            *,
            RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
        FROM event_logs
        WHERE funnel_step = 1
    ) sub
), funnel_log AS (
    SELECT l.user_id, window_num, event_time, funnel_step,
           LEAD(funnel_step) OVER(PARTITION BY l.user_id, window_num ORDER BY event_time) AS next_funnel_step 
    FROM event_logs l
    JOIN window_ w
        ON l.user_id = w.user_id
        AND event_time >= window_st_ts AND event_time < window_end_ts
), funnel_step AS (
    SELECT
        user_id, window_num,
        REPLACE(GROUP_CONCAT(DISTINCT step), ',', '') AS steps
    FROM 
    (
        -- head
        SELECT
            user_id, window_num, funnel_step AS step
        FROM funnel_log
        WHERE
            next_funnel_step IS NULL
            OR (next_funnel_step - funnel_step) = 1
    UNION ALL
        -- tail
        SELECT
            user_id, window_num, next_funnel_step AS step
        FROM funnel_log
        WHERE 
            next_funnel_step IS NULL
            OR (next_funnel_step - funnel_step) = 1
    ) sub2
    GROUP BY user_id, window_num
)
SELECT DISTINCT user_id, window_num, steps,
    case when steps like '1%' then user_id else null end as step_1,
    case when steps like '12%' then user_id else null end as step_12,
    case when steps like '123%' then user_id else null end as step_123,
    case when steps like '1234%' then user_id else null end as step_1234,
    case when steps like '12345%' then user_id else null end as step_12345,
    case when steps like '123456%' then user_id else null end as step_123456,
    case when steps like '1234567%' then user_id else null end as step_1234567,
    case when steps like '12345678%' then user_id else null end as step_12345678,
    case when steps like '123456789%' then user_id else null end as step_123456789,
    case when steps like '12345678910%' then user_id else null end as step_12345678910
FROM funnel_step

 

6. step 단계별로 유저 수를 집계해준다.

여기서 step1의 경우 1 → 3 으로 간 유저도 집계해주어야 하기 때문에, 윈도우 테이블에서 고유한 user_id의 수를 집계해준다. (step1이 있어야 윈도우가 생성되기 때문)

WITH window_ AS (
    SELECT user_id,
           window_num,
           event_time AS window_st_ts,
           LEAD(event_time, 1, NOW()) OVER(PARTITION BY user_id ORDER BY window_num) AS window_end_ts
    FROM 
    (
        SELECT 
            *,
            RANK() OVER(PARTITION BY user_id ORDER BY event_time) AS window_num
        FROM event_logs
        WHERE funnel_step = 1
    ) sub
), funnel_log AS (
    SELECT l.user_id, window_num, event_time, funnel_step,
           LEAD(funnel_step) OVER(PARTITION BY l.user_id, window_num ORDER BY event_time) AS next_funnel_step 
    FROM event_logs l
    JOIN window_ w
        ON l.user_id = w.user_id
        AND event_time >= window_st_ts AND event_time < window_end_ts
), funnel_step AS (
    SELECT
        user_id, window_num,
        REPLACE(GROUP_CONCAT(DISTINCT step), ',', '') AS steps
    FROM 
    (
        -- head
        SELECT
            user_id, window_num, funnel_step AS step
        FROM funnel_log
        WHERE
            next_funnel_step IS NULL
            OR (next_funnel_step - funnel_step) = 1
    UNION ALL
        -- tail
        SELECT
            user_id, window_num, next_funnel_step AS step
        FROM funnel_log
        WHERE 
            next_funnel_step IS NULL
            OR (next_funnel_step - funnel_step) = 1
    ) sub2
    GROUP BY user_id, window_num
)
SELECT 
    (SELECT count(distinct user_id) FROM window_) as step_1
    , count(distinct step_12) as step_12
    , count(distinct step_123) as step_123
    , count(distinct step_1234) as step_1234
    , count(distinct step_12345) as step_12345
    , count(distinct step_123456) as step_123456
    , count(distinct step_1234567) as step_1234567
    , count(distinct step_12345678) as step_12345678
    , count(distinct step_123456789) as step_123456789
    , count(distinct step_12345678910) as step_12345678910
FROM
(
    SELECT DISTINCT user_id, window_num, steps,
        case when steps like '12%' then user_id else null end as step_12,
        case when steps like '123%' then user_id else null end as step_123,
        case when steps like '1234%' then user_id else null end as step_1234,
        case when steps like '12345%' then user_id else null end as step_12345,
        case when steps like '123456%' then user_id else null end as step_123456,
        case when steps like '1234567%' then user_id else null end as step_1234567,
        case when steps like '12345678%' then user_id else null end as step_12345678,
        case when steps like '123456789%' then user_id else null end as step_123456789,
        case when steps like '12345678910%' then user_id else null end as step_12345678910
    FROM funnel_step
) sub3