微信搜索lxw1234bigdata | 邀请体验:数阅–数据管理、OLAP分析与可视化平台 | 赞助作者:赞助作者

hive中合理使用union all与multi insert

Hive lxw1234@qq.com 17278℃ 2评论

对同一张表的union all 要比多重insert快的多,
原因是hive本身对这种union all做过优化,即只扫描一次源表;

而多重insert也只扫描一次,但应为要insert到多个分区,所以做了很多其他的事情,导致消耗的时间非常长;
希望大家在开发的时候多测,多试!

lxw_test3 12亿左右记录数

Union all : 耗时7分钟左右

 

 

 

 

		create table lxw_test5 as 
		select type,popt_id,login_date 
		from (
		        select 'm3_login' as type,popt_id,login_date  
		        from lxw_test3 
		        where login_date>='2012-02-01' and login_date<'2012-05-01' 
		        union all 
		        select 'mn_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-05-01' and login_date<='2012-05-09' 
		        union all 
		        select 'm3_g_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='1' 
		        union all 
		        select 'm3_l_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='2' 
		        union all 
		        select 'm3_s_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='3' 
		        union all 
		        select 'm3_o_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='4' 
		        union all 
		        select 'mn_g_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='1' 
		        union all 
		        select 'mn_l_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='2' 
		        union all 
		        select 'mn_s_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='3' 
		        union all 
		        select 'mn_o_login' as type,popt_id,login_date 
		        from lxw_test3 
		        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='4' 
		) x

多重insert耗时25分钟左右:

		FROM lxw_test3 
		insert overwrite table lxw_test6 partition (flag = '1') 
		select 'm3_login' as type,popt_id,login_date  
		where login_date>='2012-02-01' and login_date<'2012-05-01' 
		insert overwrite table lxw_test6 partition (flag = '2') 
		select 'mn_login' as type,popt_id,login_date 
		where login_date>='2012-05-01' and login_date<='2012-05-09' 
		insert overwrite table lxw_test6 partition (flag = '3') 
		select 'm3_g_login' as type,popt_id,login_date 
		where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='1' 
		insert overwrite table lxw_test6 partition (flag = '4') 
		select 'm3_l_login' as type,popt_id,login_date 
		where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='2' 
		insert overwrite table lxw_test6 partition (flag = '5') 
		select 'm3_s_login' as type,popt_id,login_date 
		where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='3' 
		insert overwrite table lxw_test6 partition (flag = '6') 
		select 'm3_o_login' as type,popt_id,login_date 
		where login_date>='2012-02-01' and login_date<'2012-05-01' and apptypeid='4' 
		insert overwrite table lxw_test6 partition (flag = '7') 
		select 'mn_g_login' as type,popt_id,login_date 
		where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='1' 
		insert overwrite table lxw_test6 partition (flag = '8') 
		select 'mn_l_login' as type,popt_id,login_date 
		where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='2' 
		insert overwrite table lxw_test6 partition (flag = '9') 
		select 'mn_s_login' as type,popt_id,login_date 
		where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='3' 
		insert overwrite table lxw_test6 partition (flag = '10') 
		select 'mn_o_login' as type,popt_id,login_date 
		where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='4'

如果觉得本博客对您有帮助,请 赞助作者

转载请注明:lxw的大数据田地 » hive中合理使用union all与multi insert

喜欢 (6)
分享 (0)
发表我的评论
取消评论
表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
(2)个小伙伴在吐槽
  1. 你好,有没有数据集lxw_test3的下载链接?谢谢! 12亿左右记录数
    晓君2015-11-27 22:29 回复
    • 这个没有,之前都是用线上数据生成的。
      lxw1234@qq.com2015-11-30 08:46 回复