Hive for Data Engineers

Hive for Data Engineers

This course is designed to help you understand how to optimize Hive features when working with Qubole.

ABOUT THIS COURSE

LEARNING FORMAT:

Self-paced

DESCRIPTION:

The objective of this course is to familiarize you with Optimizing Hive Joins, Hive Tuning and Dynamic Partitioning.

Estimated time to complete this course: 30 minutes.

Recommended Prerequisites:

Optimizing Hive Joins

Configuring key Hive Join options will help optimize your workflow. In this lesson you'll learn about:

  • Map Joins
  • Outer Joins
  • Bucket Joins
  • Skew Joins
Hive Tuning

There are many ways to tune Hive for maximizing the efficiency of your queries. In this lesson you'll learn how about:

  • Hive Aggregation
  • Reducer optimization
  • Hive User Defined Functions
  • Hive Sessions
  • Hive Storage Handlers
Dynamic Partitioning

There are several ways you can use Dynamic Partitioning to improve query performance. This section provides recommended best practices for the following items:

  • Command Execution
  • Many Small Files
  • Hive File Output Behavior
  • Utilizing Cluster Resources
  • Entire System Scan
  • Configuring Dynamic Partitioning
  • Tex Split Calculation & Application Master
  • Final Output Format
  • Transitioning From a Database to Hive
  • File Compression

Recommended Follow Up:

 

Curriculum

  • Course Introduction
  • Course Terminology
  • Hive Joins
  • Hive Resources Best Practices
  • Hive Environment Best Practices
  • Hive Dynamic Partitioning
  • Course Conclusion

ABOUT THIS COURSE

LEARNING FORMAT:

Self-paced

DESCRIPTION:

The objective of this course is to familiarize you with Optimizing Hive Joins, Hive Tuning and Dynamic Partitioning.

Estimated time to complete this course: 30 minutes.

Recommended Prerequisites:

Optimizing Hive Joins

Configuring key Hive Join options will help optimize your workflow. In this lesson you'll learn about:

  • Map Joins
  • Outer Joins
  • Bucket Joins
  • Skew Joins
Hive Tuning

There are many ways to tune Hive for maximizing the efficiency of your queries. In this lesson you'll learn how about:

  • Hive Aggregation
  • Reducer optimization
  • Hive User Defined Functions
  • Hive Sessions
  • Hive Storage Handlers
Dynamic Partitioning

There are several ways you can use Dynamic Partitioning to improve query performance. This section provides recommended best practices for the following items:

  • Command Execution
  • Many Small Files
  • Hive File Output Behavior
  • Utilizing Cluster Resources
  • Entire System Scan
  • Configuring Dynamic Partitioning
  • Tex Split Calculation & Application Master
  • Final Output Format
  • Transitioning From a Database to Hive
  • File Compression

Recommended Follow Up:

 

Curriculum

  • Course Introduction
  • Course Terminology
  • Hive Joins
  • Hive Resources Best Practices
  • Hive Environment Best Practices
  • Hive Dynamic Partitioning
  • Course Conclusion