SDP Python API¶

SDP Python API is available through pyspark.pipelines Python module.

The convention to alias the import of Spark Declarative Pipelines' Python module is dp.

from pyspark import pipelines as dp

pyspark.pipelines Python Module¶

pyspark.pipelines module (in __init__.py) imports pyspark.pipelines.api module to expose the following Python functions (incl. decorators) to wildcard imports:

append_flow
create_sink
create_streaming_table
materialized_view
table
temporary_view

Python Decorators¶

Declarative Pipelines uses Python decorators to define tables and views.

Decorator	Purpose
@dp.append_flow	Append-only flows
@dp.materialized_view	Materialized views (with supporting flows)
@dp.table	Streaming and batch tables (with supporting flows)
@dp.temporary_view	Temporary views (with supporting flows)

@dp.append_flow¶

append_flow(
    *,
    target: str,
    name: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Callable[[QueryFunction], None] # (1)!

QueryFunction = Callable[[], DataFrame] is a Python function that takes no arguments and returns a PySpark DataFrame.

Registers an append Flow (in the active GraphElementRegistry)

target is the name of the dataset (destination) this flow writes to.

dp.create_sink¶

create_sink(
    name: str,
    format: str,
    options: Optional[Dict[str, str]] = None,
) -> None

Registers a Sink output in the active GraphElementRegistry.

Not Python Decorator

Unlike the others, create_sink is not a Python decorator (Callable).

dp.create_streaming_table¶

create_streaming_table(
    name: str,
    *,
    comment: Optional[str] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> None

Not Python Decorator

Unlike the others, create_streaming_table is not a Python decorator (Callable).

Registers a StreamingTable dataset (in the active GraphElementRegistry) for Append Flows.

@dp.materialized_view¶

materialized_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a MaterializedView dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.table¶

table(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a StreamingTable dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.temporary_view¶

temporary_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a TemporaryView dataset with an accompanying Flow in the active GraphElementRegistry.