Pyspark

Feb 26, 2026
software-engineering

Data Processing With Apache Spark - Study Notes From Data Engineering With Python Ch 14

You have streaming data. You have batch data. You have a lot of it. Now you need to actually process it. Fast. On more than one machine.

Feb 14, 2026
software-engineering

Data Engineering With GCP Chapter 5 Part 2: Working With Spark on Dataproc

In Part 1 we set up a Dataproc cluster, got familiar with HDFS, and touched on what a data lake actually is. Now it is time to get into the real work: writing PySpark code, understanding RDDs, moving data between HDFS, GCS, and BigQuery, and learning how to actually submit Spark jobs to Dataproc.

Pyspark

Data Processing With Apache Spark - Study Notes From Data Engineering With Python Ch 14

Data Engineering With GCP Chapter 5 Part 2: Working With Spark on Dataproc

About

About BookGrill.net

Category

Tags View all tags

Theme Settings

Accent Color