hive SQL案例

akia开凯 / 2023-08-27 / 原文

  • 上个月用户连续n天登录天数

用户登录记录表user_login,包括用户id(user_id)、日期(login_date)
user_id login_date
12333256 2021-01-03
84272916 2021-01-03
94038271 2021-01-02
201934012 2021-01-03
过滤出当月数据:where login_date > ''
转换成date:to_date函数,参数默认按照"yyyy-MM-dd"的格式
去重:一天可能有多次登录记录(distinct、group by、row_number);
开窗排序,生成rn1:按user_id分区,login_date排序
login_date与rn1作差得到新日期dt2:date_sub(login_date, rn1)
统计天数:按user_id、dt2分组,再使用count(1)统计连续登录天数
开窗排序,生成rn2:按user_id分区,count(1)排序
with t1 as (
    select
        user_id, 
        to_date(login_date) as dt1,
        row_number() over(partition by user_id order by to_date(login_date)) as rn1
    from db.user_login
    where login_date > '2023-07-31'
    group by user_id, to_date(login_date)
),
t2 as (
    select
        user_id,
        date_sub(dt1, rn1) as dt2
    from t1
),
t3 as (
    select
        user_id,
        dt2,
        count(1) as continuous_day,
        row_number() over(partition by user_id order by count(1) desc) rn2
    from t2
    group by user_id, dt2
)
select
    user_id,
    dt2
from t3
where rn2 = 1;
View Code