hive SQL案例
-
上个月用户连续n天登录天数
用户登录记录表user_login,包括用户id(user_id)、日期(login_date)
| user_id | login_date |
| 12333256 | 2021-01-03 |
| 84272916 | 2021-01-03 |
| 94038271 | 2021-01-02 |
| 201934012 | 2021-01-03 |
过滤出当月数据:where login_date > ''
转换成date:to_date函数,参数默认按照"yyyy-MM-dd"的格式
转换成date:to_date函数,参数默认按照"yyyy-MM-dd"的格式
去重:一天可能有多次登录记录(distinct、group by、row_number);
开窗排序,生成rn1:按user_id分区,login_date排序
login_date与rn1作差得到新日期dt2:date_sub(login_date, rn1)
统计天数:按user_id、dt2分组,再使用count(1)统计连续登录天数
开窗排序,生成rn2:按user_id分区,count(1)排序
with t1 as ( select user_id, to_date(login_date) as dt1, row_number() over(partition by user_id order by to_date(login_date)) as rn1 from db.user_login where login_date > '2023-07-31' group by user_id, to_date(login_date) ), t2 as ( select user_id, date_sub(dt1, rn1) as dt2 from t1 ), t3 as ( select user_id, dt2, count(1) as continuous_day, row_number() over(partition by user_id order by count(1) desc) rn2 from t2 group by user_id, dt2 ) select user_id, dt2 from t3 where rn2 = 1;