统计视频观看数Top10
- 思路:使用order by按照 views 字段做一个全局排序,设置只显示前10条即可
实现SQL:
select videoid,uploader,age,category,length,views,rate,ratings,comments
from guliyingyin_video_orc
order by views desc
limit 10;

- 结果:
统计视频类别热度Top10
思路:炸开数组”视频类别“字段,然后按照类别分组,最后按照热度(视频个数)排序。
因为当前表结构为:一个视频对应一个或多个类别。所以如果要 group by 类别,需要先将类别进行列转行(展开),然后再进行count即可
- 分析过程:
- 炸开类别,记为临时表t1
select videoId,catName
from guliyingyin_video_orc
lateral view explode(category) tb_category as catName; — t1 - 按照类别 group by 聚合,然后count组内的videoId个数,记为临时表t2
select catName,count(*) hot
from t1
group by catName; —t2 - 最后按照热度排序,显示前10条。
select catName, hot
from t2
order by hot desc
limit 10;
- 完整的SQL:
SELECT catName, hot FROM ( SELECT catName, count( * ) hot FROM ( SELECT videoId, catName FROM guliyingyin_video_orc lateral VIEW explode ( category ) tb_category AS catName ) t1 GROUP BY catName ) t2 ORDER BY hot DESC LIMIT 10;
将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令:
结果:
统计视频观看数Top20所属类别
- 思路:
- 先找到观看数最高的20个视频,记为临时表t1
select videoid,views,category
from guliyingyin_video_orc
order by views desc
limit 20; ----t1 - 把这20条信息中的category分裂出来(列转行),记为临时表t2
select videoid,catName
from t1
lateral view explode(category) tb_category as catName; — t2 - 去重
select distinct catName
from t2;
- 完整SQL
SELECT DISTINCT
catName
FROM
(
SELECT
videoid,
catName
FROM
( SELECT videoid, views, category FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 20 ) t1 lateral VIEW explode ( category ) tb2_category AS catName
) t2;
将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令:
结果:
统计视频观看数Top50所关联视频的类别的Rank
思路分析:
思路:
- 查询出观看数最多的前50个视频的所有信息(包含每个视频对应的关联视频),记为临时表t1
select videoId, views, category, relatedId
from guliyingyin_video_orc
order by views desc
limit 50; ----t1 - 炸裂关联视频id:将找到的50条视频信息的相关视频的relatedId列转行,记为临时表t2
select distinct videoId_name
from t1
lateral view explode(relatedId) tb_relatedId as videoId_name; ----t2 - 将关联视频的id和guliyingyin_video_orc表进行inner join操作,得到每个关联视频id的详细数据,记为临时表t4
select *
from t2
inner join guliyingyin_video_orc t3
on t2.videoId_name=t3.videoId; ---- t4 - 炸裂关联视频的类别
select *
from t4
lateral view explode(category) tb_category as catName; ----t5 - 统计类别个数
select catName, count(*) hot
from t5
group by catName; ----t6 - 统计类别的热度排名
select *
from t6
order by hot desc;
- 完整SQL:
SELECT
*
FROM
(
SELECT
catName,
count( * ) hot
FROM
(
SELECT
*
FROM
(
SELECT
*
FROM
(
SELECT DISTINCT
videoId_name
FROM
( SELECT videoId, views, category, relatedId FROM guliyingyin_video_orc ORDER BY views DESC LIMIT 50 ) t1 lateral VIEW explode ( relatedId ) tb_relatedId AS videoId_name
) t2
INNER JOIN guliyingyin_video_orc t3 ON t2.videoId_name = t3.videoId
) t4 lateral VIEW explode ( category ) tb_category AS catName
) t5
GROUP BY
catName
) t6
ORDER BY
hot DESC;
将完整的SQL语句保存到guliyingying.sql文件中,并上传到Linux。然后执行命令:
结果:
统计每个类别中的视频热度Top10,以Music为例
思路:要想统计Music类别中的视频热度Top10,需要先找到Music类别,那么就需要将category炸开。
-
创建一个临时表用于存放categoryId炸开的数据
create table guliyingyin_category_orc( videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int, relatedId array ) row format delimited fields terminated by "\t" collection items terminated by "&" stored as orc; -
向category展开的表中插入数据。
insert overwrite table guliyingyin_category_orc select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from guliyingyin_video_orc lateral view explode(category) tb_category as categoryId; -
统计对应类别(Music)中的视频热度。
select videoId,views from guliyingyin_category_orc where categoryId = "Music" order by views desc limit 10;
结果:
统计每个类别中视频流量Top10,以Music为例
select videoId,ratings
from guliyingyin_category_orc
where categoryId = "Music"
order by ratings desc
limit 10;
结果:
统计上传视频最多的用户Top10以及他们上传的观看次数在前20的视频
- 分析
-
上传视频最多的用户Top10,记为表t1
select uploader from guliyingyin_user_orc order by videos desc limit 10; -----t1 -
观看次数在前20的视频
select t1.uploader,t2.videoId,views from t1 inner join guliyingyin_video_orc t2 on t1.uploader= t2.uploader order by views desc limit 20;
- 完整SQL语句
SELECT
t1.uploader,
t2.videoId,
views
FROM
( SELECT uploader FROM guliyingyin_user_orc ORDER BY videos DESC LIMIT 10 ) t1
INNER JOIN guliyingyin_video_orc t2 ON t1.uploader = t2.uploader
ORDER BY
views DESC
LIMIT 20;
- 结果
统计每个类别视频观看数Top10
- 思路:
-
先得到categoryId展开的中间表
-
子查询按照categoryId进行分区,然后分区内排序,并生成递增数字,该递增数字这一列起名为rank列
select videoid,categoryid,views, row_number() over(partition by categoryid order by views desc) rank234 from guliyingyin_category_orc; -
通过子查询产生的临时表,查询rank值小于等于10的数据行即可
select * from t1 where rank关注打赏
