DeltaCDFRelation¶
DeltaCDFRelation is a BaseRelation (Spark SQL) and a PrunedFilteredScan (Spark SQL).
Creating Instance¶
DeltaCDFRelation takes the following to be created:
-
SnapshotWithSchemaMode -
SQLContext(Spark SQL) - Starting version
- Ending version
DeltaCDFRelation is created when:
CDCReaderImplis requested for a CDF-aware BaseRelation and emptyCDFRelation
Building Distributed Scan¶
PrunedFilteredScan
buildScan(
requiredColumns: Array[String],
filters: Array[Filter]): RDD[Row]
buildScan is part of the PrunedFilteredScan (Spark SQL) abstraction.
buildScan creates a batch DataFrame of changes.
buildScan does column pruning with the requiredColumns defined (using Dataset.select operator).
In the end, buildScan converts the DataFrame to RDD[Row] (using DataFrame.rdd operator).
Schema¶
schema cdcReadSchema for the schema of the delta table (based on the Metadata of the snapshotForBatchSchema).