Skip to content

DeltaTableV2

DeltaTableV2 is a logical representation of a writable Delta table.

DeltaTableV2 by Spark SQL 3.0.0

Using the abstractions introduced in Spark SQL 3.0.0, DeltaTableV2 is a Table that SupportsWrite.

Creating Instance

DeltaTableV2 takes the following to be created:

  • SparkSession
  • Hadoop Path
  • Optional Catalog Metadata (Option[CatalogTable])
  • Optional Table ID (Option[String])
  • Optional DeltaTimeTravelSpec

DeltaTableV2 is created when:

DeltaTimeTravelSpec

DeltaTableV2 may be given a DeltaTimeTravelSpec to be created.

DeltaTimeTravelSpec is assumed not to be defined.

DeltaTableV2 is given a DeltaTimeTravelSpec only when DeltaDataSource is requested to create a BaseRelation.

DeltaTimeTravelSpec is used for timeTravelSpec.

Snapshot

snapshot: Snapshot

DeltaTableV2 has a Snapshot. In other words, DeltaTableV2 represents a Delta table at a specific version.

Scala lazy value

snapshot is a Scala lazy value and is initialized once when first accessed. Once computed it stays unchanged.

DeltaTableV2 uses the DeltaLog to load it at a given version (based on the optional timeTravelSpec) or update to the latest version.

snapshot is used when DeltaTableV2 is requested for the schema, partitioning and properties.

DeltaTimeTravelSpec

timeTravelSpec: Option[DeltaTimeTravelSpec]

DeltaTableV2 may have a DeltaTimeTravelSpec specified that is either given or timeTravelByPath.

timeTravelSpec throws an AnalysisException when timeTravelOpt and timeTravelByPath are both defined:

Cannot specify time travel in multiple formats.

timeTravelSpec is used when DeltaTableV2 is requested for a Snapshot and BaseRelation.

DeltaTimeTravelSpec by Path

timeTravelByPath: Option[DeltaTimeTravelSpec]

Scala lazy value

timeTravelByPath is a Scala lazy value and is initialized once when first accessed. Once computed it stays unchanged.

timeTravelByPath is undefined when CatalogTable is defined.

With no CatalogTable defined, DeltaTableV2 parses the given Path for the timeTravelByPath (that resolvePath under the covers).

Converting to Insertable HadoopFsRelation

toBaseRelation: BaseRelation

toBaseRelation verifyAndCreatePartitionFilters for the Path, the current Snapshot and partitionFilters.

In the end, toBaseRelation requests the DeltaLog for an insertable HadoopFsRelation.

toBaseRelation is used when:

  • DeltaDataSource is requested to createRelation
  • DeltaRelation utility is used to fromV2Relation

Last update: 2020-10-13