Skip to content

Demo: Create Virtual Environment for Python Client

This demo shows how to work with a development (unreleased) version of Spark Declarative Pipelines.

Note

For released versions of Spark Declarative Pipelines, use uvx instead:

uvx --with "pyspark[pipelines]" spark-pipelines

It is assumed that SPARK_HOME environment variable points at the sources of Apache Spark.

export SPARK_HOME=/Users/jacek/oss/spark

Create SDP Project using uv

uv init hello-spark-pipelines && cd hello-spark-pipelines
uv add --editable $SPARK_HOME/python/packaging/client
uv tree --depth 2
hello-spark-pipelines v0.1.0
└── pyspark-client v4.2.0.dev0
    ├── googleapis-common-protos v1.72.0
    ├── grpcio v1.76.0
    ├── grpcio-status v1.76.0
    ├── numpy v2.4.2
    ├── pandas v3.0.0
    ├── pyarrow v23.0.0
    ├── pyyaml v6.0.3
    └── zstandard v0.25.0
uv pip list
Package                  Version     Editable project location
------------------------ ----------- ----------------------------------------------
googleapis-common-protos 1.72.0
grpcio                   1.76.0
grpcio-status            1.76.0
numpy                    2.4.2
pandas                   3.0.0
protobuf                 6.33.5
pyarrow                  23.0.0
pyspark-client           4.2.0.dev0  /Users/jacek/oss/spark/python/packaging/client
python-dateutil          2.9.0.post0
pyyaml                   6.0.3
six                      1.17.0
typing-extensions        4.15.0
zstandard                0.25.0

Activate Virtual Environment

Activate (source) the virtual environment (that uv helped us create).

source .venv/bin/activate

This activation brings all the necessary Spark Declarative Pipelines Python dependencies (that are only available in the source format only) for non-uv tools and CLI, incl. Spark Pipelines CLI itself.

Use Spark Pipelines CLI

$SPARK_HOME/bin/spark-pipelines --help
usage: cli.py [-h] {run,dry-run,init} ...

Pipelines CLI

positional arguments:
  {run,dry-run,init}
    run               Run a pipeline. If no refresh options specified, a
                      default incremental update is performed.
    dry-run           Launch a run that just validates the graph and checks
                      for errors.
    init              Generate a sample pipeline project, with a spec file and
                      example transformations.

options:
  -h, --help          show this help message and exit

macOS and PYSPARK_PYTHON

On macOS, you may want to define PYSPARK_PYTHON environment variable to point at Python >= 3.10.

export PYSPARK_PYTHON=python3.14