Skip to content

The Internals of PySpark (Apache Spark 3.5.2)

Welcome to The Internals of PySpark online book! 🤙

I'm Jacek Laskowski, a Freelance Data(bricks) Engineer specializing in Apache Spark (incl. Spark SQL and Spark Structured Streaming), Delta Lake, Databricks, and Apache Kafka (incl. Kafka Streams) with brief forays into a wider data engineering space (e.g., Trino, Dask and dbt, mostly during Warsaw Data Engineering meetups).

I'm very excited to have you here and hope you will enjoy exploring the internals of PySpark as much as I have.

Flannery O'Connor

I write to discover what I know.

"The Internals Of" series

I'm also writing other online books in the "The Internals Of" series. Please visit "The Internals Of" Online Books home page.

Expect text and code snippets from a variety of public sources. Attribution follows.

Now, let's take a deep dive into PySpark 🔥


Last update: 2024-09-15