Bucketing & Window Functions with Hive

placeholder

Learners explore how Apache Hive query executions can be optimized including techniques such as bucketing data sets in this Skillsoft Aspire course. Using windowing functions to extract meaningful insights from data is also covered. This 10-video course assumes previous work with partitions in Hive as well as conceptual understanding of how buckets can improve query performance. Learners begin by focusing on how to use the bucketing technique to process big data efficiently. Then take a look at HDFS (Hadoop Distributed File System) by navigating to the shell of the Hadoop master node; from there make use of the Hadoop fs-ls command to examine contents of the directory. Observe three subdirectories corresponding to three partitions based on the value of the category column. You will then explore how to combine both the partitioning as well as bucketing techniques to further improve query performance. Finally learners will explore the concept of co-windowing which helps users analyze a subset of ordered data and then to see how this technique can be implemented in Hive.