Skip to content

cli Python Module

pyspark/pipelines/cli.py Python module is at the heart of the Spark Pipelines CLI.

Launch Standalone Application

main() -> None

main...FIXME


main is used when:

Run Pipeline

run(
    spec_path: Path,
    full_refresh: Sequence[str],
    full_refresh_all: bool,
    refresh: Sequence[str],
    dry: bool,
) -> None

run...FIXME


run is used when:

load_pipeline_spec

load_pipeline_spec(
    spec_path: Path,
) -> PipelineSpec

load_pipeline_spec builds a PipelineSpec off of the YAML file at the given spec_path.


load_pipeline_spec is used when:

unpack_pipeline_spec

unpack_pipeline_spec(
    spec_data: Mapping[str, Any],
) -> PipelineSpec

unpack_pipeline_spec creates a PipelineSpec from the given spec_data mapping.

unpack_pipeline_spec makes sure that only allowed fields are used (with two required):

  • name (required)
  • storage (required)
  • catalog
  • database
  • schema
  • configuration
  • libraries
database takes precedence over schema

database and schema are synonyms, with the former taking precedence over the latter.

PySparkException

unpack_pipeline_spec raises a PySparkException when either there are unexpected fields or the required fields are missing.


unpack_pipeline_spec is used when: