Skip to content

DeltaCDFRelation

DeltaCDFRelation is a BaseRelation (Spark SQL) and a PrunedFilteredScan (Spark SQL).

Creating Instance

DeltaCDFRelation takes the following to be created:

  • SnapshotWithSchemaMode
  • SQLContext (Spark SQL)
  • Starting version
  • Ending version

DeltaCDFRelation is created when:

Building Distributed Scan

PrunedFilteredScan
buildScan(
  requiredColumns: Array[String],
  filters: Array[Filter]): RDD[Row]

buildScan is part of the PrunedFilteredScan (Spark SQL) abstraction.

buildScan creates a batch DataFrame of changes.

buildScan does column pruning with the requiredColumns defined (using Dataset.select operator).

In the end, buildScan converts the DataFrame to RDD[Row] (using DataFrame.rdd operator).

Schema

BaseRelation
schema: StructType

schema is part of the BaseRelation (Spark SQL) abstraction.

schema cdcReadSchema for the schema of the delta table (based on the Metadata of the snapshotForBatchSchema).