描述:表f是事实表,表d是匹配表,在hive中如何将匹配表中的值关联到事实表中?
表d相当于拉链过的变化维,但日期范围可能是不全的。
表f:
date_id p_id 2017 C 2018 B 2019 A 2013 C
表d:
d_start d_end p_id p_value 2016 2018 A 1 2016 2018 B 2 2008 2009 C 4 2010 2015 C 3
描述:范围匹配
输出结果如下所示:
date_id p_id p_value 2017 C null 2018 B 2 2019 A null 2013 C 3
参考答案:
此处给出两种解法,其一:
select f.date_id, f.p_id, A.p_value from f left join ( select date_id, p_id, p_value from ( select f.date_id, f.p_id, d.p_value from f left join d on f.p_id = d.p_id where f.date_id >= d.d_start and f.date_id <= d.d_end )A )A ON f.date_id = A.date_id;
其二:
select date_id, p_id, flag as p_value from ( select f.date_id, f.p_id, d.d_start, d.d_end, d.p_value, if(f.date_id between d.d_start and d.d_end,d.p_value,null) flag, max(d.d_end) over(partition by date_id) max_end from f left join d on f.p_id = d.p_id ) tmp where d_end = max_end;