Spark for Data Ops

Spark for Data Ops

This course is designed to help you understand how Spark integrates with Qubole to provide your team with fast, inexpensive and scalable data processing.

ABOUT THIS COURSE

LEARNING FORMAT:

Self-paced

DESCRIPTION:

This course introduces you to key best practices related to Spark in Qubole, by providing Cluster and Notebook Configuration options designed to optimize your data throughput and outcomes.

Estimated time to complete this course: 30 mins.

Recommended Prerequisites:

Spark Cluster Administration

Understanding the Cluster Administration in Spark is your first step toward optimization. In this lesson you'll learn about the:

  • Spark Job Submission Workflow
  • Spark & YARN Interaction
  • Spark Application States
  • YARN Behavior & Management
  • Memory Settings
  • Spark & YARN Resources
  • Spark History Server
  • Spark Driver & Executors
  • Spark DirectFileOutputCommitter (DFOC)
  • Common Job Failure Scenarios

Spark Notebook Configuration

In this lesson you'll learn key concepts for configuring Spark Notebooks to optimize performance.

  • Spark Notebook Submission Workflow
  • Notebook & YARN Resources and Troubleshooting
  • Master Node Responsibilities
  • Notebook Logs & Ports

 

Curriculum

  • Course Introduction
  • Course Terminology
  • Analyze Job Submission Workflow
  • Spark Driver and Executor Relationship
  • Notebook Job Submission Workflow
  • Course Conclusion

ABOUT THIS COURSE

LEARNING FORMAT:

Self-paced

DESCRIPTION:

This course introduces you to key best practices related to Spark in Qubole, by providing Cluster and Notebook Configuration options designed to optimize your data throughput and outcomes.

Estimated time to complete this course: 30 mins.

Recommended Prerequisites:

Spark Cluster Administration

Understanding the Cluster Administration in Spark is your first step toward optimization. In this lesson you'll learn about the:

  • Spark Job Submission Workflow
  • Spark & YARN Interaction
  • Spark Application States
  • YARN Behavior & Management
  • Memory Settings
  • Spark & YARN Resources
  • Spark History Server
  • Spark Driver & Executors
  • Spark DirectFileOutputCommitter (DFOC)
  • Common Job Failure Scenarios

Spark Notebook Configuration

In this lesson you'll learn key concepts for configuring Spark Notebooks to optimize performance.

  • Spark Notebook Submission Workflow
  • Notebook & YARN Resources and Troubleshooting
  • Master Node Responsibilities
  • Notebook Logs & Ports

 

Curriculum

  • Course Introduction
  • Course Terminology
  • Analyze Job Submission Workflow
  • Spark Driver and Executor Relationship
  • Notebook Job Submission Workflow
  • Course Conclusion