Skip to content

Python

Python Import Alias Convention

As of this commit 6ab0df9, the convention to alias the import of Declarative Pipelines in Python is dp.

from pyspark import pipelines as dp

pyspark.pipelines Python Module

pyspark.pipelines module (in __init__.py) imports pyspark.pipelines.api module to expose the following Python functions (incl. decorators) to wildcard imports:

Use the following import in your Python code:

from pyspark import pipelines as dp

Python Decorators

Declarative Pipelines uses Python decorators to define tables and views.

Decorator Purpose
@dp.append_flow Append-only flows
@dp.materialized_view Materialized views (with supporting flows)
@dp.table Streaming and batch tables (with supporting flows)
@dp.temporary_view Temporary views (with supporting flows)

@dp.append_flow

append_flow(
    *,
    target: str,
    name: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Callable[[QueryFunction], None] # (1)!
  1. QueryFunction = Callable[[], DataFrame] is a Python function that takes no arguments and returns a PySpark DataFrame.

Registers an append Flow (in the active GraphElementRegistry)

target is the name of the dataset (destination) this flow writes to.

dp.create_sink

create_sink(
    name: str,
    format: str,
    options: Optional[Dict[str, str]] = None,
) -> None

Registers a Sink output in the active GraphElementRegistry.

Not Python Decorator

Unlike the others, create_sink is not a Python decorator (Callable).

dp.create_streaming_table

create_streaming_table(
    name: str,
    *,
    comment: Optional[str] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> None
Not Python Decorator

Unlike the others, create_streaming_table is not a Python decorator (Callable).

Registers a StreamingTable dataset (in the active GraphElementRegistry) for Append Flows.

@dp.materialized_view

materialized_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a MaterializedView dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.table

table(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    cluster_by: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a StreamingTable dataset with an accompanying Flow in the active GraphElementRegistry.

@dp.temporary_view

temporary_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
) -> Union[Callable[[QueryFunction], None], None]

Registers a TemporaryView dataset with an accompanying Flow in the active GraphElementRegistry.