Demo: Create Virtual Environment for Python Client¶
This demo shows how to work with a development (unreleased) version of Spark Declarative Pipelines.
Note
For released versions of Spark Declarative Pipelines, use uvx instead:
It is assumed that SPARK_HOME environment variable points at the sources of Apache Spark.
Create SDP Project using uv¶
Package Version Editable project location
------------------------ ----------- ----------------------------------------------
googleapis-common-protos 1.72.0
grpcio 1.76.0
grpcio-status 1.76.0
numpy 2.4.2
pandas 3.0.0
protobuf 6.33.5
pyarrow 23.0.0
pyspark-client 4.2.0.dev0 /Users/jacek/oss/spark/python/packaging/client
python-dateutil 2.9.0.post0
pyyaml 6.0.3
six 1.17.0
typing-extensions 4.15.0
zstandard 0.25.0
Activate Virtual Environment¶
Activate (source) the virtual environment (that uv helped us create).
This activation brings all the necessary Spark Declarative Pipelines Python dependencies (that are only available in the source format only) for non-uv tools and CLI, incl. Spark Pipelines CLI itself.
Use Spark Pipelines CLI¶
usage: cli.py [-h] {run,dry-run,init} ...
Pipelines CLI
positional arguments:
{run,dry-run,init}
run Run a pipeline. If no refresh options specified, a
default incremental update is performed.
dry-run Launch a run that just validates the graph and checks
for errors.
init Generate a sample pipeline project, with a spec file and
example transformations.
options:
-h, --help show this help message and exit