1000字范文 > hive sql union all的性能优化

hive sql union all的性能优化

时间：2023-12-26 06:38:19

union all的巧妙用法，使用union all也可以横向合并sql查询结果

我们都知道union all是纵向连接查询结果，join是横向，但是用union all横向连接大家试过吗？

恢复菜鸟代码如下：

select ep.productid,productname,count(st.tduserid),count(distinct sl.tduserid),count(distinct sn.tduserid),avg(sl.interval_level)from(select productid,productname from xxx.product where productid = '3006090') epjoin(select tduserid,productid from xxx_page_ex where l_date <= '-04-07' and l_date >= date_add('-04-07', -6)) ston ep.productid=st.productidjoin(select tduserid,interval_level,productid from xxx_launch_ex where l_date <= '-04-07' and l_date >= date_add('-04-07', -6)) slon st.productid=sl.productidjoin(select tduserid,productid from xxx_newuser_ex where l_date <= '-04-07' and l_date >= date_add('-04-07', -6)) snon sl.productid=sn.productidgroup by ep.productid,productname;

刚开始然后写出的hql语句基本没啥优化，然后在生产集群跑了20分钟没跑完。用union all只跑了1m26s，写起来可能复杂些，不多说代码如下：

select '-04-07' dates,'3006090' productid,max(pro) productname,sum(pv) pv,sum(uv) uv,cast(sum(duration) as decimal(10,4)) duration,sum(new_uv) new_uvfrom (select productname pro,'0' pv,'0' uv,'0' duration,'0' new_uvfrom xxx.product where productid = '3006090'union allselect '0' pro,count(tduserid) pv,'0' uv,'0' duration,'0' new_uvfrom xxx_page_ex where l_date <= '-04-07' and l_date >= date_add('-04-07', -6) andproductid = '3006090'union allselect '0' pro,'0' pv,count(distinct tduserid) uv,avg(interval_level) duration,'0' new_uvfrom xxx_launch_ex where l_date <= '-04-07' and l_date >= date_add('-04-07', -6) andproductid = '3006090'union allselect '0' pro,'0' pv,'0' uv,'0' duration,count(distinct tduserid) new_uvfrom xxx_newuser_ex where l_date <= '-04-07' and l_date >= date_add('-04-07', -6) andproductid = '3006090') t;

那么数值的可以用sum求和，当有汉字时怎么办呢，用max就可以解决这个问题。

这里的是脚本代码

#！/bin/bashAPP=XXX(数据库)hive=/opt/module/hive/bin/hiveif [ -n "$1" ];thenn_date= $1eslen_date=`date -d "1 day ago" +%Y-%m-%d`fisql="set hive.exec.dynamic.partition.mode=nonstrict；set hive.exec.parallel=true；set hive.exec.parallel.thread.number=16；insert overwrite table "$APP".dws.dispaly_upv partition(dt='$n_date')select cast('${n_date}' as date) dates,'3006090' productid,max(pro) productname,sum(pv) pv,sum(uv) uv,cast(sum(duration) as decimal(10,2)) duration,sum(new_uv) new_uvfrom (select productname pro,'0' pv,'0' uv,'0' duration,'0' new_uvfrom "$APP".xxx_product where productid = '3006090'union allselect '0' pro,count(tduserid) pv,'0' uv,'0' duration,'0' new_uvfrom "$APP".xxx_page_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) andproductid = '3006090'union allselect '0' pro,'0' pv,count(distinct tduserid) uv,avg(interval_level) duration,'0' new_uvfrom "$APP".xxx_launch_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) andproductid = '3006090'union allselect '0' pro,'0' pv,'0' uv,'0' duration,count(distinct tduserid) new_uvfrom "$APP".xxx_newuser_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) andproductid = '3006090') t"$hive -e "$sql"