Skip to content

SDP Python API

SDP Python API is available through pyspark.pipelines Python module.

The convention to alias the import of Spark Declarative Pipelines' Python module is dp.

from pyspark import pipelines as dp

pyspark.pipelines Python Module

pyspark.pipelines module (in __init__.py) imports pyspark.pipelines.api module to expose the following Python functions (incl. decorators) to wildcard imports:

Python Decorators

Declarative Pipelines uses Python decorators to define tables and views.

Decorator Purpose
@dp.append_flow Append-only flows
@dp.materialized_view Materialized views (with supporting flows)
@dp.table Streaming and batch tables (with supporting flows)
@dp.temporary_view Temporary views (with supporting flows)

@dp.append_flow

append_flow(
    *,
    target: str,
    name: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Callable[[QueryFunction], None] # (1)!
  1. QueryFunction = Callable[[], DataFrame] is a Python function that takes no arguments and returns a PySpark DataFrame.

Registers an append Flow (in the active GraphElementRegistry)

target is the name of the dataset (destination) this flow writes to.

dp.create_sink

create_sink(
    name: str,
    format: str,
    options: Optional[Dict[str, str]] = None,
) -> None

Registers a Sink output in the active GraphElementRegistry.

Not Python Decorator

Unlike the others, create_sink is not a Python decorator (Callable).

dp.create_streaming_table

create_streaming_table(
    name: str,
    *,
    comment: Optional[str] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> None
Not Python Decorator

Unlike the others, create_streaming_table is not a Python decorator (Callable).

Registers a StreamingTable dataset (in the active GraphElementRegistry) for Append Flows.

@dp.materialized_view

materialized_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a MaterializedView dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.table

table(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a StreamingTable dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.temporary_view

temporary_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a TemporaryView dataset with an accompanying Flow in the active GraphElementRegistry.