susinlee 님의 블로그
코호트 분석 - SQL 쿼리로 구현하기 본문
customer_orders 테이블
컬럼명 | 타입 | 설명 |
ORDER_ID | INT | 주문 ID (PK) |
CUSTOMER_ID | INT | 고객 ID |
ORDER_DATE | DATE | 주문 날짜 |
ORDER_AMOUNT | DECIMAL(10, 2) | 주문 금액 |
고객의 첫 주문 월을 기준으로 코호트 그룹을 만들고,
각 코호트 그룹에서 시간이 지남에 따라 활성 사용자 수를 계산하는 쿼리를 짜보자.
각 코호트 그룹에 대해 1개월 후부터 12개월 후까지의 활성 사용자 수를 추적한다.
1. 고객별로 가장 처음 주문한 날짜를 구한다.
SELECT customer_id,
DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS first_order_month
FROM customer_orders
GROUP BY customer_id
2. 위의 테이블을 customer_orders 테이블과 join 하고, order_date의 월과 첫 주문 월의 차를 구해준다.
WITH cohort AS (
SELECT customer_id,
DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS first_order_month
FROM customer_orders
GROUP BY customer_id
)
SELECT
co.customer_id,
first_order_month,
DATE_FORMAT(order_date, '%Y-%m-01') AS order_date,
TIMESTAMPDIFF(
MONTH,
first_order_month,
DATE_FORMAT(order_date, '%Y-%m-01')
) AS month_number
FROM customer_orders co
JOIN cohort c ON co.customer_id = c.customer_id
3. 그렇게 만든 테이블을 통해 각 경우(CASE)에 대해서 고유 customer_id 수를 집계해준다.
WITH cohort AS (
SELECT customer_id,
DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS first_order_month
FROM customer_orders
GROUP BY customer_id
), actvie_order AS (
SELECT
co.customer_id,
first_order_month,
DATE_FORMAT(order_date, '%Y-%m-01') AS order_date,
TIMESTAMPDIFF(
MONTH,
first_order_month,
DATE_FORMAT(order_date, '%Y-%m-01')
) AS month_number
FROM customer_orders co
JOIN cohort c ON co.customer_id = c.customer_id
)
SELECT
first_order_month ,
COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END) AS cohort_user_count,
COUNT(DISTINCT CASE WHEN month_number = 1 THEN customer_id END) AS user_count_1_month_later,
COUNT(DISTINCT CASE WHEN month_number = 2 THEN customer_id END) AS user_count_2_month_later,
COUNT(DISTINCT CASE WHEN month_number = 3 THEN customer_id END) AS user_count_3_month_later,
COUNT(DISTINCT CASE WHEN month_number = 4 THEN customer_id END) AS user_count_4_month_later,
COUNT(DISTINCT CASE WHEN month_number = 5 THEN customer_id END) AS user_count_5_month_later,
COUNT(DISTINCT CASE WHEN month_number = 6 THEN customer_id END) AS user_count_6_month_later,
COUNT(DISTINCT CASE WHEN month_number = 7 THEN customer_id END) AS user_count_7_month_later,
COUNT(DISTINCT CASE WHEN month_number = 8 THEN customer_id END) AS user_count_8_month_later,
COUNT(DISTINCT CASE WHEN month_number = 9 THEN customer_id END) AS user_count_9_month_later,
COUNT(DISTINCT CASE WHEN month_number = 10 THEN customer_id END) AS user_count_10_month_later,
COUNT(DISTINCT CASE WHEN month_number = 11 THEN customer_id END) AS user_count_11_month_later,
COUNT(DISTINCT CASE WHEN month_number = 12 THEN customer_id END) AS user_count_12_month_later
FROM actvie_order
GROUP BY first_order_month

미리 중간 집계를 수행하여 쿼리의 효율성을 높이자
WITH cohort AS (
SELECT
CUSTOMER_ID,
DATE(DATE_FORMAT(MIN(ORDER_DATE), '%Y-%m-01')) AS first_order_month
FROM customer_orders
GROUP BY CUSTOMER_ID
), active_orders AS (
SELECT
o.CUSTOMER_ID,
c.first_order_month,
DATE(DATE_FORMAT(o.ORDER_DATE, '%Y-%m-01')) AS active_month
FROM customer_orders o
JOIN cohort c
ON o.CUSTOMER_ID = c.CUSTOMER_ID
), cohort_counts AS (
SELECT
first_order_month,
active_month,
COUNT(DISTINCT CUSTOMER_ID) AS user_count
FROM active_orders
GROUP BY first_order_month, active_month
)
SELECT
DATE_FORMAT(first_order_month, '%Y-%m') FIRST_ORDER_MONTH,
SUM(CASE WHEN active_month = first_order_month THEN user_count ELSE 0 END) AS COHORT_USER_COUNT,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 1 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_1_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 2 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_2_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 3 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_3_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 4 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_4_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 5 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_5_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 6 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_6_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 7 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_7_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 8 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_8_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 9 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_9_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 10 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_10_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 11 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_11_MONTH_LATER,
SUM(CASE WHEN active_month = DATE_ADD(first_order_month, INTERVAL 12 MONTH) THEN user_count ELSE 0 END) AS USER_COUNT_12_MONTH_LATER
FROM cohort_counts
GROUP BY first_order_month
ORDER BY first_order_month;

'코드카타 > SQL, Pandas' 카테고리의 다른 글
SQL로 조합 생성하기 (0) | 2025.03.18 |
---|---|
퍼널 분석 - SQL 쿼리로 구현하기 (0) | 2025.03.16 |
재귀 쿼리를 활용한 다양한 계층적 데이터 및 시계열 데이터 다루기 (0) | 2025.03.15 |
쿼리 엔진 별로 비교해 보는 날짜 시간 함수 (0) | 2025.03.15 |
DISTINCT ON, FIRST_VALUE() (0) | 2025.03.12 |