MaterializedView¶
MaterializedView is a Table that represents a materialized view in a pipeline dataflow graph.
MaterializedView is created using @materialized_view decorator.
MaterializedView is a Python class.
materialized_view¶
materialized_view(
query_function: Optional[QueryFunction] = None,
*,
name: Optional[str] = None,
comment: Optional[str] = None,
spark_conf: Optional[Dict[str, str]] = None,
table_properties: Optional[Dict[str, str]] = None,
partition_cols: Optional[List[str]] = None,
schema: Optional[Union[StructType, str]] = None,
format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]
materialized_view uses query_function for the parameters unless they are specified explicitly.
materialized_view uses the name of the decorated function as the name of the materialized view unless specified explicitly.
materialized_view makes sure that GraphElementRegistry has been set (using graph_element_registration_context context manager).
Demo
from pyspark.pipelines.graph_element_registry import (
graph_element_registration_context,
get_active_graph_element_registry,
)
from pyspark.pipelines.spark_connect_graph_element_registry import (
SparkConnectGraphElementRegistry,
)
dataflow_graph_id = "demo_dataflow_graph_id"
registry = SparkConnectGraphElementRegistry(spark, dataflow_graph_id)
with graph_element_registration_context(registry):
graph_registry = get_active_graph_element_registry()
assert graph_registry == registry
materialized_view creates a new MaterializedView and requests the GraphElementRegistry to register_dataset it.
materialized_view creates a new Flow and requests the GraphElementRegistry to register_flow it.