Auto CDC Flows¶
dp.create_auto_cdc_flow is used in PySpark to create an Auto CDC flow into the target table from the Change Data Capture (CDC) source.
Target table must have already been created using the dp.create_streaming_table function.
Auto CDC flows are represented as AutoCdcFlows.
AutoCdcFlow is resolved to AutoCdcMergeFlow (when FlowResolver is requested to resolve a flow).
AutoCdcMergeFlow applies a CDC event stream to a target table via MERGE.
AutoCdcMergeFlow can be one of the two SCD Types (based on ChangeArgs):
- SCD Type 1
- SCD Type 2
AutoCdcMergeFlow SCD Type 1 (with a Table output) is executed as a Scd1MergeStreamingWrite. All the other variants fail at execution either with an UnsupportedOperationException or an AnalysisException.
Scd1MergeStreamingWrite is executed as a streaming query with foreachBatch (Spark Structured Streaming) streaming operator and Trigger.AvailableNow.