Hadoop + Spark
Understand and implement integrated Hadoop + Spark architectures for scalable data processing.
Get Course Info
Audience: Data Engineers
Duration: 3 days
Format: Lecture + labs
Overview
Combines Hadoop’s storage capabilities with Spark’s processing power. Students learn how to integrate Spark with HDFS, Hive, and YARN for end‑to‑end Big Data solutions.
Objective
Understand and implement integrated Hadoop + Spark architectures for scalable data processing.
What You Will Learn
- Spark on YARN
- Using Hive tables in Spark SQL
- Optimising data formats (Parquet, ORC)
Course Details
Audience: Data Engineers
Duration: 3 days
Format: Lecture + labs
Prerequisites:
Basic Hadoop and Spark knowledge
Setup: Cluster with Hadoop 3 and Spark 3 provided
Detailed Outline
- Cluster vs. client mode
- Resource allocation and tuning
- Reading and writing Hive tables
- ACID tables support
- Using Hive functions in Spark
- Partitioning strategies
- Columnar storage formats
- Compression techniques
Ready to Get Started?
Contact us to learn more about this course and schedule your training.