DeltaCDFRelation¶
DeltaCDFRelation
is a BaseRelation
(Spark SQL) and a PrunedFilteredScan
(Spark SQL).
Creating Instance¶
DeltaCDFRelation
takes the following to be created:
-
SnapshotWithSchemaMode
-
SQLContext
(Spark SQL) - Starting version
- Ending version
DeltaCDFRelation
is created when:
CDCReaderImpl
is requested for a CDF-aware BaseRelation and emptyCDFRelation
Building Distributed Scan¶
PrunedFilteredScan
buildScan(
requiredColumns: Array[String],
filters: Array[Filter]): RDD[Row]
buildScan
is part of the PrunedFilteredScan
(Spark SQL) abstraction.
buildScan
creates a batch DataFrame of changes.
buildScan
does column pruning with the requiredColumns
defined (using Dataset.select
operator).
In the end, buildScan
converts the DataFrame
to RDD[Row]
(using DataFrame.rdd
operator).
Schema¶
schema
cdcReadSchema for the schema of the delta table (based on the Metadata of the snapshotForBatchSchema).