Skip to course content

Hadoop + Spark

Understand and implement integrated Hadoop + Spark architectures for scalable data processing.

Get Course Info

Audience: Data Engineers

Duration: 3 days

Format: Lecture + labs

Overview

Combines Hadoop’s storage capabilities with Spark’s processing power. Students learn how to integrate Spark with HDFS, Hive, and YARN for end‑to‑end Big Data solutions.

Objective

Understand and implement integrated Hadoop + Spark architectures for scalable data processing.

What You Will Learn

  • Spark on YARN
  • Using Hive tables in Spark SQL
  • Optimising data formats (Parquet, ORC)

Course Details

Audience: Data Engineers

Duration: 3 days

Format: Lecture + labs

Prerequisites:

Basic Hadoop and Spark knowledge

Setup: Cluster with Hadoop 3 and Spark 3 provided

Detailed Outline

  • Cluster vs. client mode
  • Resource allocation and tuning
  • Reading and writing Hive tables
  • ACID tables support
  • Using Hive functions in Spark
  • Partitioning strategies
  • Columnar storage formats
  • Compression techniques

Ready to Get Started?

Contact us to learn more about this course and schedule your training.