侧边栏壁纸
博主头像
AllInOne博主等级

随风来,随风去

  • 累计撰写 45 篇文章
  • 累计创建 27 个标签
  • 累计收到 2 条评论

目 录CONTENT

文章目录

Hive数据去重并且合并小文件(实战篇)

AllInOne
2024-06-03 / 0 评论 / 0 点赞 / 143 阅读 / 156 字
温馨提示:
点赞-关注-不迷路。

单分区去重

set mapred.reduce.tasks = 10;
WITH temptable as ( 
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY aa, bb,cc
                       ORDER BY `timestamp` DESC) as row_num
  FROM db.tablename where dt='2024-05-30'
)
INSERT OVERWRITE TABLE db.tablenam partition(dt='2024-05-30') 
SELECT 
`aa`,
`bb`,
`cc`
FROM temptable
WHERE row_num = 1;

动态分区去重

set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrick;
set mapred.reduce.tasks = 5;
WITH temptable as ( 
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY dt,aa, bb,cc,dd
                       ORDER BY `timestamp` DESC) as row_num
  FROM dbname.tablename where dt>'2024-05-30'
)
INSERT OVERWRITE TABLE dbname.tablename partition(dt) 
SELECT 
`aa`,
`bb`,
`cc`,
`dd`,
`dt`
FROM temptable
WHERE row_num = 1;
0

评论区