Hadoop + Spark

Understand and implement integrated Hadoop + Spark architectures for scalable data processing.

Get Course Info

Audience: Data Engineers

Duration: 3 days

Format: Lecture + labs

Overview

Combines Hadoop’s storage capabilities with Spark’s processing power. Students learn how to integrate Spark with HDFS, Hive, and YARN for end‑to‑end Big Data solutions.

Objective

Understand and implement integrated Hadoop + Spark architectures for scalable data processing.

What You Will Learn

Spark on YARN
Using Hive tables in Spark SQL
Optimising data formats (Parquet, ORC)

Course Details

Audience: Data Engineers

Duration: 3 days

Format: Lecture + labs

Prerequisites:

Basic Hadoop and Spark knowledge

Setup: Cluster with Hadoop 3 and Spark 3 provided

Detailed Outline

Cluster vs. client mode
Resource allocation and tuning

Reading and writing Hive tables
ACID tables support
Using Hive functions in Spark

Partitioning strategies
Columnar storage formats
Compression techniques