Demo: SDP Python API¶
Activate Virtual Environment
Follow Demo: Create Virtual Environment for Python Client before getting started with this demo.
In a terminal, start a Spark Connect Server.
It will listen on port 15002.
Start PySpark Shell¶
Start a Spark Connect-enabled PySpark shell.
from pyspark.pipelines.spark_connect_pipeline import create_dataflow_graph
dataflow_graph_id = create_dataflow_graph(
spark,
default_catalog=None,
default_database=None,
sql_conf=None,
)
# >>> print(dataflow_graph_id)
# 3cb66d5a-0621-4f15-9920-e99020e30e48
from pyspark.pipelines.spark_connect_graph_element_registry import SparkConnectGraphElementRegistry
registry = SparkConnectGraphElementRegistry(spark, dataflow_graph_id)
from pyspark.pipelines.graph_element_registry import graph_element_registration_context
with graph_element_registration_context(registry):
dp.create_streaming_table("demo_streaming_table")
You should see the following INFO message in the logs of the Spark Connect Server: