Python¶
Python Import Alias Convention¶
As of this commit 6ab0df9, the convention to alias the import of Declarative Pipelines in Python is dp.
from pyspark import pipelines as dp
pyspark.pipelines Python Module¶
pyspark.pipelines module (in __init__.py) imports pyspark.pipelines.api module to expose the following Python functions (incl. decorators) to wildcard imports:
Use the following import in your Python code:
from pyspark import pipelines as dp
Python Decorators¶
Declarative Pipelines uses Python decorators to define tables and views.
| Decorator | Purpose |
|---|---|
| @dp.append_flow | Append-only flows |
| @dp.materialized_view | Materialized views (with supporting flows) |
| @dp.table | Streaming and batch tables (with supporting flows) |
| @dp.temporary_view | Temporary views (with supporting flows) |
@dp.append_flow¶
append_flow(
*,
target: str,
name: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
) -> Callable[[QueryFunction], None] # (1)!
QueryFunction = Callable[[], DataFrame]is a Python function that takes no arguments and returns a PySparkDataFrame.
Registers an append Flow (in the active GraphElementRegistry)
target is the name of the dataset (destination) this flow writes to.
dp.create_sink¶
create_sink(
name: str,
format: str,
options: Optional[Dict[str, str]] = None,
) -> None
Registers a Sink output in the active GraphElementRegistry.
Not Python Decorator
Unlike the others, create_sink is not a Python decorator (Callable).
dp.create_streaming_table¶
create_streaming_table(
name: str,
*,
comment: Optional[str] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> None
Not Python Decorator
Unlike the others, create_streaming_table is not a Python decorator (Callable).
Registers a StreamingTable dataset (in the active GraphElementRegistry) for Append Flows.
@dp.materialized_view¶
materialized_view(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]
Registers a MaterializedView dataset with an accompanying Flow in the active GraphElementRegistry.
@dp.table¶
table(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
cluster_by: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]
Registers a StreamingTable dataset with an accompanying Flow in the active GraphElementRegistry.
@dp.temporary_view¶
temporary_view(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
) -> Union[Callable[[QueryFunction], None], None]
Registers a TemporaryView dataset with an accompanying Flow in the active GraphElementRegistry.