japila-books.github.io

The Internals Online Books

Welcome to “The Internals Of” Online Books project! 🤙

I’m Jacek Laskowski, an IT freelancer specializing in Apache Spark, Delta Lake and Apache Kafka (with brief forays into a wider data engineering space, e.g. Trino and ksqlDB, mostly during Warsaw Data Engineering meetups).

I’m very excited to have you here and hope you will enjoy exploring the internals of the open source projects together (in no particular order):

  1. Apache Spark
  2. Spark SQL
  3. Spark Structured Streaming
  4. Delta Lake
  5. Spark on Kubernetes
  6. PySpark
  7. Apache Kafka
  8. Kafka Streams
  9. Apache Beam

Please note that some books have less current content than others, but that’s expected with a one-person project where some many things are so interesting and thus time-consuming. Life’s too short to taste everything :/

The aim of this project is to host all the current and future internals books under a single organization on GitHub and publish to a single domain via GitHub Pages (until I find a better way to publish the books).

Custom Docker Image

The books projects use a custom Docker image.

The official Docker image does not include all plugins the books need as well as is no longer available.

Review Dockerfile and requirements.txt files to learn more.

Build Books Docker Image

Clone or pull the latest tag of the Material for MkDocs Insiders repository.

Execute the build-image.sh shell script to build the Docker image.

Build Book

Use docker run command with build argument to build a book.

docker run \
  --rm \
  -it \
  -p 8000:8000 \
  -v ${PWD}:/docs \
  jaceklaskowski/mkdocs-material-insiders \
  build --clean

TIP: Consult the Material for MkDocs documentation to get started.

Live Editing

Use docker run command with serve argument (with --dirtyreload for faster reloads) in the project root (the folder with mkdocs.yml).

docker run \
  --rm \
  -it \
  -p 8000:8000 \
  -v ${PWD}:/docs \
  jaceklaskowski/mkdocs-material-insiders \
  serve --dirtyreload --verbose --dev-addr 0.0.0.0:8000

List Outdated Packages

Run an interactive shell in a container.

docker run \
  --rm \
  -it \
  -p 8000:8000 \
  -v ${PWD}:/docs \
  --entrypoint sh \
  jaceklaskowski/mkdocs-material-insiders

While inside, execute the following command to list outdated packages, and show the latest version available (as described here).

python -m pip list --outdated