SDP Python API¶
SDP Python API is available through pyspark.pipelines Python module.
The convention to alias the import of Spark Declarative Pipelines' Python module is dp.
pyspark.pipelines Python Module¶
pyspark.pipelines module (in __init__.py) imports pyspark.pipelines.api module to expose the following Python functions (incl. decorators) to wildcard imports:
- append_flow
- create_auto_cdc_flow
- create_sink
- create_streaming_table
- materialized_view
- table
- temporary_view
Python Decorators¶
Declarative Pipelines uses Python decorators to define tables and views.
| Decorator | Purpose |
|---|---|
| @dp.append_flow | Append-only flows |
| @dp.materialized_view | Materialized views (with supporting flows) |
| @dp.table | Streaming and batch tables (with supporting flows) |
| @dp.temporary_view | Temporary views (with supporting flows) |
@dp.append_flow¶
append_flow(
*,
target: str,
name: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
) -> Callable[[QueryFunction], None] # (1)!
QueryFunction = Callable[[], DataFrame]is a Python function that takes no arguments and returns a PySparkDataFrame.
Registers an append Flow with the active GraphElementRegistry
target is the name of the dataset (destination) this flow writes to.
dp.create_auto_cdc_flow¶
create_auto_cdc_flow(
target: str,
source: str,
keys: Union[List[str], List[Column]],
sequence_by: Union[str, Column],
apply_as_deletes: Optional[Union[str, Column]] = None,
column_list: Optional[Union[List[str], List[Column]]] = None,
except_column_list: Optional[Union[List[str], List[Column]]] = None,
stored_as_scd_type: Optional[Literal[1, "1"]] = None,
name: Optional[str] = None,
) -> None
Registers an AutoCdcFlow with the active GraphElementRegistry.
Not Python Decorator
Unlike the others, create_auto_cdc_flow is not a Python decorator (Callable).
dp.create_sink¶
Registers a Sink output in the active GraphElementRegistry.
Not Python Decorator
Unlike the others, create_sink is not a Python decorator (Callable).
dp.create_streaming_table¶
create_streaming_table(
name: str,
*,
comment: Optional[str] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> None
Registers a StreamingTable dataset (with the active GraphElementRegistry) for Append Flows.
Not Python Decorator
Unlike the others, create_streaming_table is not a Python decorator (Callable).
@dp.materialized_view¶
materialized_view(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]
Registers a MaterializedView dataset with an accompanying Flow with the active GraphElementRegistry.
@dp.table¶
table(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]
Registers a StreamingTable dataset with an accompanying Flow in the active GraphElementRegistry.
@dp.temporary_view¶
temporary_view(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
) -> Union[Callable[[QueryFunction], None], None]
Registers a TemporaryView dataset with an accompanying Flow in the active GraphElementRegistry.