SparkPipelines¶
SparkPipelines is a standalone application that spark-pipelines shell script uses to run pyspark/pipelines/cli.py Python script.
This somewhat convoluted way of executing pyspark/pipelines/cli.py Python script lets Spark Declarative Pipelines use the full execution power of spark-submit (Apache Spark) (with the built-in support for Spark Connect among other features) with extra pipelines-specific command-line arguments and options.
SparkPipelines behaves similarly to executingspark-submit explicitly as follows:
spark-submit \
[sparkSubmitArgs] \
/absolute/path/to/pyspark/pipelines/cli.py \
[pipelinesArgs]
uvx --from "pyspark[pipelines]==4.1.1" \
spark-submit \
[sparkSubmitArgs] \
/absolute/path/to/pyspark/pipelines/cli.py \
[pipelinesArgs]
Launch SparkPipelines¶
main(
args: Array[String]): Unit
main expects the first command-line argument to be the absolute path of the pyspark/pipelines/cli.py Python script.
main runs SparkSubmit (Apache Spark) with the arguments properly ordered.
constructSparkSubmitArgs¶
constructSparkSubmitArgs(
pipelinesCliFile: String,
args: Array[String]): Seq[String]
constructSparkSubmitArgs splits the given args into spark-submit- and pipelines-specific ones.
constructSparkSubmitArgs gives a sequence of the spark-submit-specific arguments followed by the given pipelinesCliFile and the pipelines-specific arguments.
splitArgs¶
splitArgs(
args: Array[String]): (Seq[String], Seq[String])
splitArgs parses the given args (using a custom SparkSubmitArgumentsParser (Apache Spark)) and returns a pair of spark-submit- and pipelines-specific arguments.
splitArgs forces spark.api.mode configuration property to be connect.
SparkUserAppException
splitArgs reports a SparkUserAppException when spark.api.mode configuration property is specified explicitly on command line and is not connect.
Declarative Pipelines currently only supports Spark Connect.
splitArgs uses local as the default value of --remote command-line option.
splitArgs creates a custom SparkSubmitArgumentsParser to parse the given args.
All known arguments are considered spark-submit-specific except the following:
--name-h--help
Unknown and extra arguments are pipelines-specific.