cli Python Module¶
pyspark/pipelines/cli.py Python module is at the heart of the Spark Pipelines CLI.
Launch Standalone Application¶
main() -> None
main...FIXME
main is used when:
SparkPipelinesis launched as a standalone application (with the first argument being the path to thispyspark/pipelines/cli.pymodule)
Run Pipeline¶
run(
spec_path: Path,
full_refresh: Sequence[str],
full_refresh_all: bool,
refresh: Sequence[str],
dry: bool,
) -> None
run...FIXME
run is used when:
cli.pyis launched as a standalone application (with eitherrunordry-runoptions)
load_pipeline_spec¶
load_pipeline_spec(
spec_path: Path,
) -> PipelineSpec
load_pipeline_spec builds a PipelineSpec off of the YAML file at the given spec_path.
load_pipeline_spec is used when:
cli.pyis requested to run the pipeline
unpack_pipeline_spec¶
unpack_pipeline_spec(
spec_data: Mapping[str, Any],
) -> PipelineSpec
unpack_pipeline_spec creates a PipelineSpec from the given spec_data mapping.
unpack_pipeline_spec makes sure that only allowed fields are used (with two required):
name(required)storage(required)catalogdatabaseschemaconfigurationlibraries
database takes precedence over schema
database and schema are synonyms, with the former taking precedence over the latter.
PySparkException
unpack_pipeline_spec raises a PySparkException when either there are unexpected fields or the required fields are missing.
unpack_pipeline_spec is used when:
cli.pyis requested to load_pipeline_spec