Analyze Job Submission Workflow
Spark Driver and Executor Relationship
Notebook Job Submission Workflow
Spark for Data Ops
This course is designed to help you understand how Spark integrates with Qubole to provide your team with fast, inexpensive and scalable data processing.
This course introduces you to key best practices related to Spark in Qubole, by providing Cluster and Notebook Configuration options designed to optimize your data throughput and outcomes.
Estimated time to complete this course: 30 mins.
Spark Cluster Administration
Understanding the Cluster Administration in Spark is your first step toward optimization. In this lesson you'll learn about the:
- Spark Job Submission Workflow
- Spark & YARN Interaction
- Spark Application States
- YARN Behavior & Management
- Memory Settings
- Spark & YARN Resources
- Spark History Server
- Spark Driver & Executors
- Spark DirectFileOutputCommitter (DFOC)
- Common Job Failure Scenarios
Spark Notebook Configuration
In this lesson you'll learn key concepts for configuring Spark Notebooks to optimize performance.
- Spark Notebook Submission Workflow
- Notebook & YARN Resources and Troubleshooting
- Master Node Responsibilities
- Notebook Logs & Ports
Course Version & Product Release
This course is based on Release 54 - to see the latest updates to Qubole please refer to the release notes in our documentation.