DeltaTableV2¶
DeltaTableV2 is a logical representation of a writable delta table.
Creating Instance¶
DeltaTableV2 takes the following to be created:
-
SparkSession(Spark SQL) - Path (Hadoop HDFS)
- CatalogTable Metadata
- Optional Table ID
- Optional DeltaTimeTravelSpec
- Options
- CDC Options
DeltaTableV2 is created when:
DeltaTableutility is used to forPath and forNameDeltaCatalogis requested to load a tableDeltaDataSourceis requested to load a table or create a table relation
Table Metadata (CatalogTable)¶
catalogTable: Option[CatalogTable] = None
DeltaTableV2 can be given CatalogTable (Spark SQL) when created. It is undefined by default.
catalogTable is specified when:
- DeltaTable.forName is used (for a cataloged delta table)
DeltaCatalogis requested to load a table (that is aV1Tableand a cataloged delta table)
catalogTable is used when:
DeltaTableV2is requested for the rootPath (to avoid parsing the path), the name, the properties and the CatalogTable itself- DeltaAnalysis logical resolution rule is requested to resolve a RestoreTableStatement (for a
TableIdentifier) DeltaRelationutility is used to fromV2Relation- AlterTableSetLocationDeltaCommand is executed
CDC Options¶
cdcOptions: CaseInsensitiveStringMap
DeltaTableV2 can be given cdcOptions when created. It is empty by default (and most of the time).
cdcOptions is specified when:
DeltaDataSourceis requested to create a relation (for CDC read)DeltaTableV2is requested to withOptions
cdcOptions is used when:
DeltaTableV2is requested for a BaseRelation
CDF-Aware Relation¶
cdcRelation: Option[BaseRelation]
Lazy Value
cdcRelation is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
With CDF-aware read, cdcRelation returns a CDF-aware relation for the following:
Otherwise, cdcRelation returns None (an undefined value).
cdcRelation is used when:
DeltaTableV2is requested for the table schema and the relation
Options¶
DeltaTableV2 can be given options (as a Map[String, String]). Options are empty by default.
The options are defined when DeltaDataSource is requested for a relation with spark.databricks.delta.loadFileSystemConfigsFromDataFrameOptions configuration property enabled.
The options are used for the following:
- Looking up
pathorpathsoptions - Creating the DeltaLog
DeltaLog¶
DeltaTableV2 creates a DeltaLog for the rootPath and the given options.
Table¶
DeltaTableV2 is a Table (Spark SQL).
SupportsWrite¶
DeltaTableV2 is a SupportsWrite (Spark SQL).
V2TableWithV1Fallback¶
DeltaTableV2 is a V2TableWithV1Fallback (Spark SQL).
v1Table¶
V2TableWithV1Fallback
v1Table: CatalogTable
v1Table is part of the V2TableWithV1Fallback (Spark SQL) abstraction.
v1Table returns the CatalogTable (with CatalogStatistics removed if DeltaTimeTravelSpec has also been specified).
v1Table expects that the (optional) CatalogTable metadata is specified or throws a DeltaIllegalStateException:
v1Table call is not expected with path based DeltaTableV2
DeltaTimeTravelSpec¶
DeltaTableV2 may be given a DeltaTimeTravelSpec when created.
DeltaTimeTravelSpec is assumed not to be defined by default (None).
DeltaTableV2 is given a DeltaTimeTravelSpec when:
DeltaDataSourceis requested for a BaseRelation
DeltaTimeTravelSpec is used for timeTravelSpec.
Properties¶
properties requests the Snapshot for the table properties and adds the following:
| Name | Value |
|---|---|
provider | delta |
location | path |
comment | description (of the Metadata) if available |
Type | table type of the CatalogTable if available |
Table Capabilities¶
Table
capabilities(): Set[TableCapability]
capabilities is part of the Table (Spark SQL) abstraction.
capabilities is the following:
ACCEPT_ANY_SCHEMA(Spark SQL)BATCH_READ(Spark SQL)V1_BATCH_WRITE(Spark SQL)OVERWRITE_BY_FILTER(Spark SQL)TRUNCATE(Spark SQL)
Creating WriteBuilder¶
SupportsWrite
newWriteBuilder(
info: LogicalWriteInfo): WriteBuilder
newWriteBuilder is part of the SupportsWrite (Spark SQL) abstraction.
newWriteBuilder creates a WriteIntoDeltaBuilder (for the DeltaLog and the options from the LogicalWriteInfo).
Snapshot¶
snapshot: Snapshot
DeltaTableV2 has a Snapshot. In other words, DeltaTableV2 represents a Delta table at a specific version.
Lazy Value
snapshot is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
DeltaTableV2 uses the DeltaLog to load it at a given version (based on the optional timeTravelSpec) or update to the latest version.
snapshot is used when:
DeltaTableV2is requested for the schema, partitioning and properties
DeltaTimeTravelSpec¶
timeTravelSpec: Option[DeltaTimeTravelSpec]
DeltaTableV2 may have a DeltaTimeTravelSpec specified that is either given or extracted from the path (for timeTravelByPath).
timeTravelSpec throws an AnalysisException when timeTravelOpt and timeTravelByPath are both defined:
Cannot specify time travel in multiple formats.
timeTravelSpec is used when:
DeltaTableV2is requested for a Snapshot and BaseRelation
DeltaTimeTravelSpec by Path¶
timeTravelByPath: Option[DeltaTimeTravelSpec]
Scala lazy value
timeTravelByPath is a Scala lazy value and is initialized once when first accessed. Once computed it stays unchanged.
timeTravelByPath is undefined when CatalogTable is defined.
With no CatalogTable defined, DeltaTableV2 parses the given Path for the timeTravelByPath (that resolvePath under the covers).
Converting to Insertable HadoopFsRelation¶
toBaseRelation: BaseRelation
toBaseRelation verifyAndCreatePartitionFilters for the Path, the current Snapshot and partitionFilters.
In the end, toBaseRelation requests the DeltaLog for an insertable HadoopFsRelation.
toBaseRelation is used when:
DeltaDataSourceis requested to create a relation (for a table scan)DeltaRelationis requested to fromV2Relation