'코드카타/Pandas' 카테고리의 글 목록

코드카타/Pandas

108. Triangle Judgement 09:41:37
106. The Number of Employees Which Report to Each Employee 2025.01.14
107. Primary Department for Each Employee 2025.01.13
105. Customers Who Bought All Products 2025.01.12
104. Biggest Single Number 2025.01.11
103. Find Followers Count 2025.01.08
102. Classes More Than 5 Students 2025.01.07
101. Product Sales Analysis III 2025.01.06
100. User Activity for the Past 30 Days I 2025.01.05
Game Play Analysis IV 2025.01.01

108. Triangle Judgement

susinlee 2025. 1. 15. 09:41

2025. 1. 15. 09:41

[문제]

https://leetcode.com/problems/triangle-judgement/description/

[풀이]

1. 가장 큰 변의 길이와 전체 변의 길이를 다 더한다음 2로 나눈 값을 비교한다.

2. 그때 가장 큰 변의 길이가 더 작다면 'No' 를 아니라면 'Yes'를 나타내는 열을 생성해준다.

가장 큰 변의 길이보다 나머지 두 변의 길이의 합이 더 크면 된다. 가장 큰 변의 길이는 구할 수 있겠는데 나머지 두 변을 어떻게 가져오지라는 고민... 그러다가 다 더한다음에 2로 나눠버리면 그 숫자는 결국 (나머지 변의 길이 합) 과 (가장 큰 변의 길이)로 나뉘어질 수 있겠구나 생각이 나서 구현. 즉, 다 더해서 2로 나눈 값이 가장 큰 변의 길이보다 크면 삼각형 그리기 가능.

Pandas

import pandas as pd

def triangle_judgement(triangle: pd.DataFrame) -> pd.DataFrame:
    triangle['triangle'] = triangle.apply(lambda x: "Yes" if (x['x'] + x['y'] + x['z'])/2 > max(x['x'], x['y'], x['z']) else 'No', axis=1)

    return triangle

SQL

SELECT
    x, y, z
    , IF((x+y+z)/2 > GREATEST(x, y, z), 'Yes', 'No') AS triangle
FROM Triangle

'코드카타 > Pandas' 카테고리의 다른 글

106. The Number of Employees Which Report to Each Employee (0)	2025.01.14
107. Primary Department for Each Employee (0)	2025.01.13
105. Customers Who Bought All Products (0)	2025.01.12
104. Biggest Single Number (0)	2025.01.11
103. Find Followers Count (0)	2025.01.08

106. The Number of Employees Which Report to Each Employee

susinlee 2025. 1. 14. 09:49

2025. 1. 14. 09:49

[문제]

https://leetcode.com/problems/the-number-of-employees-which-report-to-each-employee/description/

[풀이]

1. 매니저 아이디(reports_to) 별로 그룹화 한 후 agg 함수를 통해 열이름과 집계함수를 동시에 적용해준다.

2. reports_to 는 count 함수를, age는 mean 함수를 적용하고, 예전에 살펴봤듯이 판다스는 반올림 규칙(0.5일 때는 짝수로 처리)이 따로 있기 때문에 작은 수를 더해줘서 이부분을 커버해준다. 열이름은 각각 reports_count, average_age 로 설정한다.

3. rename 함수를 통해 reports_to를 employee_id로 변경하여 병합을 쉽게 만들고, 출력할 컬럼만 선택한 뒤 sort_values 함수로 employee_id 를 정렬해서 반환하자.

Pandas

import pandas as pd

def count_employees(employees: pd.DataFrame) -> pd.DataFrame:
    t1 = employees.groupby('reports_to').agg(
        reports_count=('reports_to', 'count')
        , average_age=('age', lambda x: round((x+0.0001).mean()))
    ).reset_index().rename(columns={'reports_to':'employee_id'})
    return t1.merge(employees)[['employee_id', 'name', 'reports_count', 'average_age']].sort_values('employee_id')

SQL

SELECT 
    e2.employee_id
    , e2.name
    , COUNT(e1.employee_id) as reports_count
    , ROUND(AVG(e1.age)) as average_age
FROM Employees e1
JOIN Employees e2
    ON e1.reports_to = e2.employee_id
GROUP BY employee_id
ORDER BY employee_id

'코드카타 > Pandas' 카테고리의 다른 글

108. Triangle Judgement (0)	2025.01.15
107. Primary Department for Each Employee (0)	2025.01.13
105. Customers Who Bought All Products (0)	2025.01.12
104. Biggest Single Number (0)	2025.01.11
103. Find Followers Count (0)	2025.01.08

107. Primary Department for Each Employee

susinlee 2025. 1. 13. 12:02

2025. 1. 13. 12:02

[문제]

https://leetcode.com/problems/primary-department-for-each-employee/description/

[풀이]

1. 집계함수를 그룹별로 적용할 때 전체 행을 유지하면서 집계해주는 transform 함수를 사용

2. primary_flag가 'Y' 이거나 소속부서가 하나인 행들만 필터링

3. employee_id와 department_id 컬럼만 반환

Pandas

import pandas as pd

def find_primary_department(employee: pd.DataFrame) -> pd.DataFrame:
    employee['department_cnt'] = employee.groupby('employee_id')['department_id'].transform('count')
    return employee[(employee['primary_flag'] == 'Y') | (employee['department_cnt'] == 1)][['employee_id', 'department_id']]

SQL

SELECT
    employee_id
    , department_id
FROM Employee
WHERE primary_flag = 'Y'
OR employee_id IN (SELECT employee_id
                   FROM Employee
                   GROUP BY employee_id
                   HAVING COUNT(department_id) = 1)

'코드카타 > Pandas' 카테고리의 다른 글

108. Triangle Judgement (0)	2025.01.15
106. The Number of Employees Which Report to Each Employee (0)	2025.01.14
105. Customers Who Bought All Products (0)	2025.01.12
104. Biggest Single Number (0)	2025.01.11
103. Find Followers Count (0)	2025.01.08

105. Customers Who Bought All Products

susinlee 2025. 1. 12. 11:17

2025. 1. 12. 11:17

[문제]

https://leetcode.com/problems/customers-who-bought-all-products/description/

[풀이]

1. customer 테이블에서 customer_id 별로 그룹화한 뒤 product_key를 세어준다.

2. 같은 상품을 여러 번 구매했을 수 있으므로 중복을 제거해준다.

3. product 테이블의 개수와 같은 customer_id 만 필터링한다.

Pandas

import pandas as pd

def find_customers(customer: pd.DataFrame, product: pd.DataFrame) -> pd.DataFrame:
    df = customer.drop_duplicates().groupby('customer_id').count().reset_index()

    return df[df['product_key'] == len(product)][['customer_id']]

SQL

SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product)

'코드카타 > Pandas' 카테고리의 다른 글

106. The Number of Employees Which Report to Each Employee (0)	2025.01.14
107. Primary Department for Each Employee (0)	2025.01.13
104. Biggest Single Number (0)	2025.01.11
103. Find Followers Count (0)	2025.01.08
102. Classes More Than 5 Students (0)	2025.01.07

104. Biggest Single Number

susinlee 2025. 1. 11. 22:18

2025. 1. 11. 22:18

[문제]

https://leetcode.com/problems/biggest-single-number/description/

[풀이]

1. 중복 행을 남김없이 전부 제거해준다. drop_duplicates() 함수에 keep=False 매개변수를 주면 된다.

2. max() 함수를 통해 최댓값을 구해준다.

3. 데이터프레임에 max() 함수를 쓰게 되면 시리즈가 되는데 to_frame() 함수로 데이터프레임으로 변환해준다.

Pandas

import pandas as pd

def biggest_single_number(my_numbers: pd.DataFrame) -> pd.DataFrame:
    return my_numbers.drop_duplicates(keep=False).max().to_frame(name='num')

SQL

SELECT MAX(num) as num
FROM 
(
    SELECT 
        num
    FROM MyNumbers
    GROUP BY num
    HAVING COUNT(num) = 1
) a

'코드카타 > Pandas' 카테고리의 다른 글

107. Primary Department for Each Employee (0)	2025.01.13
105. Customers Who Bought All Products (0)	2025.01.12
103. Find Followers Count (0)	2025.01.08
102. Classes More Than 5 Students (0)	2025.01.07
101. Product Sales Analysis III (0)	2025.01.06

103. Find Followers Count

susinlee 2025. 1. 8. 09:23

2025. 1. 8. 09:23

[문제]

https://leetcode.com/problems/find-followers-count/description/

[풀이]

1. user_id 별로 그룹화 후 follower_id 수를 센다

2. user_id 로 오름차순 정렬해준다

Pandas

import pandas as pd

def count_followers(followers: pd.DataFrame) -> pd.DataFrame:
    
    return followers.groupby('user_id').size().reset_index(name='followers_count').sort_values('user_id')

SQL

SELECT 
    user_id
    , COUNT(follower_id) as followers_count
FROM Followers
GROUP BY user_id
ORDER BY user_id

'코드카타 > Pandas' 카테고리의 다른 글

105. Customers Who Bought All Products (0)	2025.01.12
104. Biggest Single Number (0)	2025.01.11
102. Classes More Than 5 Students (0)	2025.01.07
101. Product Sales Analysis III (0)	2025.01.06
100. User Activity for the Past 30 Days I (0)	2025.01.05

102. Classes More Than 5 Students

susinlee 2025. 1. 7. 09:18

2025. 1. 7. 09:18

[문제]

https://leetcode.com/problems/classes-more-than-5-students/

[풀이]

1. class로 그룹화 해준 뒤 student의 수를 세어준다

2. student 수가 5이상 행들만 필터링하고 class열만 출력한다

Pandas

import pandas as pd

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    df = courses.groupby('class')['student'].count().reset_index(name='cnt')
    return df[df['cnt'] >= 5][['class']]

SQL

SELECT class
FROM Courses
GROUP BY class
HAVING COUNT(student) >= 5

'코드카타 > Pandas' 카테고리의 다른 글

104. Biggest Single Number (0)	2025.01.11
103. Find Followers Count (0)	2025.01.08
101. Product Sales Analysis III (0)	2025.01.06
100. User Activity for the Past 30 Days I (0)	2025.01.05
Game Play Analysis IV (0)	2025.01.01

101. Product Sales Analysis III

susinlee 2025. 1. 6. 09:56

2025. 1. 6. 09:56

[문제]

https://leetcode.com/problems/product-sales-analysis-iii/description/

[풀이]

1. product_id 별로 연도가 가장 작은 행을 집계하고

2. 기존 테이블과 product_id와 year를 키로 병합

3. 병합한 테이블에서 제출 형식에 맞게 열을 선택하고 이름을 바꿔준다

Pandas

import pandas as pd

def sales_analysis(sales: pd.DataFrame, product: pd.DataFrame) -> pd.DataFrame:
    t1 = sales.groupby('product_id')['year'].min().reset_index(name='year')
    merged = t1.merge(sales, on=['product_id', 'year'], how='inner').drop('sale_id', axis=1)
    return merged.rename(columns={'year' : 'first_year'})
    
    # rank 메소드를 이용한 방법
    # sales['rnk'] = sales.groupby('product_id')['year'].rank(method='dense')
    # df = sales[sales['rnk'] == 1][['product_id', 'year', 'quantity', 'price']]
    # return df.rename(columns={'year' : 'first_year'})
    
    # transform 메소드를 이용한 방법
    # sales['first_year'] = sales.groupby('product_id').year.transform(min)
    # return sales[sales.year == sales.first_year][['product_id', 'first_year', 'quantity', 'price']]

SQL

SELECT 
    product_id
    , year AS first_year
    , quantity
    , price
FROM Sales
WHERE (product_id, year) in (SELECT 
                                product_id
                                , MIN(year)   
                            FROM Sales
                            GROUP BY product_id)

'코드카타 > Pandas' 카테고리의 다른 글

103. Find Followers Count (0)	2025.01.08
102. Classes More Than 5 Students (0)	2025.01.07
100. User Activity for the Past 30 Days I (0)	2025.01.05
Game Play Analysis IV (0)	2025.01.01
Immediate Food Delivery II (0)	2024.12.31

100. User Activity for the Past 30 Days I

susinlee 2025. 1. 5. 12:52

2025. 1. 5. 12:52

[문제]

https://leetcode.com/problems/user-activity-for-the-past-30-days-i/description/

[풀이]

1. 특정 기간 사이의 행들만 필터링 (between 메소드 사용)

2. 날짜별로 그룹화해서 고유 유저 수를 센다 (nunique 메소드 사용)

3. 컬럼명 변경 (rename 메소드 사용)

Pandas

import pandas as pd

def user_activity(activity: pd.DataFrame) -> pd.DataFrame:
    return activity[activity['activity_date'].between('2019-06-28', '2019-07-27')].groupby('activity_date')['user_id'].nunique().reset_index().rename(columns={'activity_date':'day', 'user_id':'active_users'})

SQL

SELECT 
    activity_date AS day
    , COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date BETWEEN '2019-06-28' AND '2019-07-27'
GROUP BY activity_date

'코드카타 > Pandas' 카테고리의 다른 글

102. Classes More Than 5 Students (0)	2025.01.07
101. Product Sales Analysis III (0)	2025.01.06
Game Play Analysis IV (0)	2025.01.01
Immediate Food Delivery II (0)	2024.12.31
Monthly Transactions I (0)	2024.12.30

Game Play Analysis IV

susinlee 2025. 1. 1. 22:44

2025. 1. 1. 22:44

[문제]

https://leetcode.com/problems/game-play-analysis-iv/description/

[풀이]

1. player_id 별로 첫 로그인 날짜를 구하고

2. player_id 별로 로그인 날짜가 첫 로그인 날짜보다 하루 뒤인 행의 개수를 구한다

3. 전체 player_id 고유값으로 나눈다.

Pandas

def gameplay_analysis(activity: pd.DataFrame) -> pd.DataFrame:
    activity['first'] = activity.groupby('player_id').event_date.transform(min)
    activity_second = activity[activity['first'] + pd.DateOffset(1) == activity['event_date']]
    
    return pd.DataFrame({"fraction" : [round(len(activity_second)/activity.player_id.nunique(), 2)]})

- transform() 함수는 그룹화 전 데이터프레임의 행의 개수를 유지한 채로 전달한 함수를 적용해서 값을 채워넣는다.

- pd.DateOffset() 함수는 특정 간격만큼 날짜를 이동시키는 객체를 생성한다.

SQL

SELECT 
    ROUND(COUNT(player_id)/ (SELECT COUNT(DISTINCT player_id) FROM Activity), 2) AS fraction
FROM 
    Activity
WHERE 
    (player_id, DATE_SUB(event_date, INTERVAL 1 DAY)) IN (SELECT player_id, MIN(event_date)
                                                        FROM Activity
                                                        GROUP BY player_id)

'코드카타 > Pandas' 카테고리의 다른 글

101. Product Sales Analysis III (0)	2025.01.06
100. User Activity for the Past 30 Days I (0)	2025.01.05
Immediate Food Delivery II (0)	2024.12.31
Monthly Transactions I (0)	2024.12.30
Queries Quality and Percentage (1)	2024.12.27

PREV 이전 1 2 3 NEXT 다음

susinlee 님의 블로그