Skip to content

MaterializedView

MaterializedView is a Table that represents a materialized view in a pipeline dataflow graph.

MaterializedView is created using @materialized_view decorator.

MaterializedView is a Python class.

materialized_view

materialized_view(
    query_function: Optional[QueryFunction] = None,
    *,
    name: Optional[str] = None,
    comment: Optional[str] = None,
    spark_conf: Optional[Dict[str, str]] = None,
    table_properties: Optional[Dict[str, str]] = None,
    partition_cols: Optional[List[str]] = None,
    schema: Optional[Union[StructType, str]] = None,
    format: Optional[str] = None,
) -> Union[Callable[[QueryFunction], None], None]

materialized_view uses query_function for the parameters unless they are specified explicitly.

materialized_view uses the name of the decorated function as the name of the materialized view unless specified explicitly.

materialized_view makes sure that GraphElementRegistry has been set (using graph_element_registration_context context manager).

Demo
from pyspark.pipelines.graph_element_registry import (
    graph_element_registration_context,
    get_active_graph_element_registry,
)
from pyspark.pipelines.spark_connect_graph_element_registry import (
    SparkConnectGraphElementRegistry,
)

dataflow_graph_id = "demo_dataflow_graph_id"
registry = SparkConnectGraphElementRegistry(spark, dataflow_graph_id)
with graph_element_registration_context(registry):
    graph_registry = get_active_graph_element_registry()
    assert graph_registry == registry

materialized_view creates a new MaterializedView and requests the GraphElementRegistry to register_dataset it.

materialized_view creates a new Flow and requests the GraphElementRegistry to register_flow it.