select t.dept,t.day,count(*) from( select regexp_substr(dept), '[^,]+', 1, level) dept,day from ( select wm_concat(dept) dept,day from baseinfo group by day ) m connect by level<= regexp_count(dept,',') + 1 ) t group by t.dept,t.day
dept day 10001,10002 2021-08-12 10003 2021-08-10 2021-08-10 10001,10002 2021-08-14 2021-08-14 10024,10046,10024,10043,10011,10015, 2021-08-14
wm_concat(dept),day ... group by day ,意思就是根据day来分组,对dept分组内所有值连接成一个集合
regexp_substr(字符串,正则,从左开始偏移-默认1-表示字符串的起点,获取第几个根据正则分割出来的组,默认'c'区分大小写匹配)
regexp_count(字符串1,字符串2) 返回要匹配的字符串2 在字符串1中出现的次数,没有则返回 0
regexp_substr(dept), '[^,]+', 1, level) from table connect by level<= 5 ,要输出 level个子串,这里小于等于5,没有5个就为null。显然多余的null不是我们想要的结果
select regexp_substr(dept), '[^,]+', 1, level),day from table connect by level<= regexp_count(dept,',') + 1 ,这里会根据自谦的子查询中的时间分组,依次返回 dept 集合中的每个子串
select a.dept,a.day,count(*) from( select dept,day from( select split(concat_ws(',',collect_list(m.dept)),',') dept,day from baseinfo m group by day ) t lateral view explode(dept) c as dept )a group by a.dept,a.day
最后关于hive的 lateral view explode 介绍可以看这篇博主的博文
https://blog.csdn.net/guodong2k/article/details/79459282