Skip to content

SDP Python API

SDP Python API is available through pyspark.pipelines Python module.

The convention to alias the import of Spark Declarative Pipelines' Python module is dp.

from pyspark import pipelines as dp

pyspark.pipelines Python Module

pyspark.pipelines module (in __init__.py) imports pyspark.pipelines.api module to expose the following Python functions (incl. decorators) to wildcard imports:

Python Decorators

Declarative Pipelines uses Python decorators to define tables and views.

Decorator Purpose
@dp.append_flow Append-only flows
@dp.materialized_view Materialized views (with supporting flows)
@dp.table Streaming and batch tables (with supporting flows)
@dp.temporary_view Temporary views (with supporting flows)

@dp.append_flow

append_flow(
    *,
    target: str,
    name: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Callable[[QueryFunction], None] # (1)!
  1. QueryFunction = Callable[[], DataFrame] is a Python function that takes no arguments and returns a PySpark DataFrame.

Registers an append Flow with the active GraphElementRegistry

target is the name of the dataset (destination) this flow writes to.

dp.create_auto_cdc_flow

create_auto_cdc_flow(
    target: str,
    source: str,
    keys: Union[List[str], List[Column]],
    sequence_by: Union[str, Column],
    apply_as_deletes: Optional[Union[str, Column]] = None,
    column_list: Optional[Union[List[str], List[Column]]] = None,
    except_column_list: Optional[Union[List[str], List[Column]]] = None,
    stored_as_scd_type: Optional[Literal[1, "1"]] = None,
    name: Optional[str] = None,
) -> None

Registers an AutoCdcFlow with the active GraphElementRegistry.

Not Python Decorator

Unlike the others, create_auto_cdc_flow is not a Python decorator (Callable).

dp.create_sink

create_sink(
    name: str,
    format: str,
    options: Optional[Dict[str, str]] = None,
) -> None

Registers a Sink output in the active GraphElementRegistry.

Not Python Decorator

Unlike the others, create_sink is not a Python decorator (Callable).

dp.create_streaming_table

create_streaming_table(
    name: str,
    *,
    comment: Optional[str] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> None

Registers a StreamingTable dataset (with the active GraphElementRegistry) for Append Flows.

Not Python Decorator

Unlike the others, create_streaming_table is not a Python decorator (Callable).

@dp.materialized_view

materialized_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a MaterializedView dataset with an accompanying Flow with the active GraphElementRegistry.

@dp.table

table(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a StreamingTable dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.temporary_view

temporary_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a TemporaryView dataset with an accompanying Flow in the active GraphElementRegistry.