DeltaTableV2¶
DeltaTableV2
is a logical representation of a writable delta table.
Creating Instance¶
DeltaTableV2
takes the following to be created:
-
SparkSession
(Spark SQL) - Path (Hadoop HDFS)
- CatalogTable Metadata
- Optional Table ID
- Optional DeltaTimeTravelSpec
- Options
- CDC Options
DeltaTableV2
is created when:
DeltaTable
utility is used to forPath and forNameDeltaCatalog
is requested to load a tableDeltaDataSource
is requested to load a table or create a table relation
Table Metadata (CatalogTable)¶
catalogTable: Option[CatalogTable] = None
DeltaTableV2
can be given CatalogTable
(Spark SQL) when created. It is undefined by default.
catalogTable
is specified when:
- DeltaTable.forName is used (for a cataloged delta table)
DeltaCatalog
is requested to load a table (that is aV1Table
and a cataloged delta table)
catalogTable
is used when:
DeltaTableV2
is requested for the rootPath (to avoid parsing the path), the name, the properties and the CatalogTable itself- DeltaAnalysis logical resolution rule is requested to resolve a RestoreTableStatement (for a
TableIdentifier
) DeltaRelation
utility is used to fromV2Relation- AlterTableSetLocationDeltaCommand is executed
CDC Options¶
cdcOptions: CaseInsensitiveStringMap
DeltaTableV2
can be given cdcOptions
when created. It is empty by default (and most of the time).
cdcOptions
is specified when:
DeltaDataSource
is requested to create a relation (for CDC read)DeltaTableV2
is requested to withOptions
cdcOptions
is used when:
DeltaTableV2
is requested for a BaseRelation
CDF-Aware Relation¶
cdcRelation: Option[BaseRelation]
Lazy Value
cdcRelation
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
With CDF-aware read, cdcRelation
returns a CDF-aware relation for the following:
Otherwise, cdcRelation
returns None
(an undefined value).
cdcRelation
is used when:
DeltaTableV2
is requested for the table schema and the relation
Options¶
DeltaTableV2
can be given options (as a Map[String, String]
). Options are empty by default.
The options are defined when DeltaDataSource
is requested for a relation with spark.databricks.delta.loadFileSystemConfigsFromDataFrameOptions configuration property enabled.
The options are used for the following:
- Looking up
path
orpaths
options - Creating the DeltaLog
DeltaLog¶
DeltaTableV2
creates a DeltaLog for the rootPath and the given options.
Table¶
DeltaTableV2
is a Table
(Spark SQL).
SupportsWrite¶
DeltaTableV2
is a SupportsWrite
(Spark SQL).
V2TableWithV1Fallback¶
DeltaTableV2
is a V2TableWithV1Fallback
(Spark SQL).
v1Table¶
V2TableWithV1Fallback
v1Table: CatalogTable
v1Table
is part of the V2TableWithV1Fallback
(Spark SQL) abstraction.
v1Table
returns the CatalogTable (with CatalogStatistics
removed if DeltaTimeTravelSpec has also been specified).
v1Table
expects that the (optional) CatalogTable metadata is specified or throws a DeltaIllegalStateException
:
v1Table call is not expected with path based DeltaTableV2
DeltaTimeTravelSpec¶
DeltaTableV2
may be given a DeltaTimeTravelSpec when created.
DeltaTimeTravelSpec
is assumed not to be defined by default (None
).
DeltaTableV2
is given a DeltaTimeTravelSpec
when:
DeltaDataSource
is requested for a BaseRelation
DeltaTimeTravelSpec
is used for timeTravelSpec.
Properties¶
properties
requests the Snapshot for the table properties and adds the following:
Name | Value |
---|---|
provider | delta |
location | path |
comment | description (of the Metadata) if available |
Type | table type of the CatalogTable if available |
Table Capabilities¶
Table
capabilities(): Set[TableCapability]
capabilities
is part of the Table
(Spark SQL) abstraction.
capabilities
is the following:
ACCEPT_ANY_SCHEMA
(Spark SQL)BATCH_READ
(Spark SQL)V1_BATCH_WRITE
(Spark SQL)OVERWRITE_BY_FILTER
(Spark SQL)TRUNCATE
(Spark SQL)
Creating WriteBuilder¶
SupportsWrite
newWriteBuilder(
info: LogicalWriteInfo): WriteBuilder
newWriteBuilder
is part of the SupportsWrite
(Spark SQL) abstraction.
newWriteBuilder
creates a WriteIntoDeltaBuilder (for the DeltaLog and the options from the LogicalWriteInfo
).
Snapshot¶
snapshot: Snapshot
DeltaTableV2
has a Snapshot. In other words, DeltaTableV2
represents a Delta table at a specific version.
Lazy Value
snapshot
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
DeltaTableV2
uses the DeltaLog to load it at a given version (based on the optional timeTravelSpec) or update to the latest version.
snapshot
is used when:
DeltaTableV2
is requested for the schema, partitioning and properties
DeltaTimeTravelSpec¶
timeTravelSpec: Option[DeltaTimeTravelSpec]
DeltaTableV2
may have a DeltaTimeTravelSpec specified that is either given or extracted from the path (for timeTravelByPath).
timeTravelSpec
throws an AnalysisException
when timeTravelOpt and timeTravelByPath are both defined:
Cannot specify time travel in multiple formats.
timeTravelSpec
is used when:
DeltaTableV2
is requested for a Snapshot and BaseRelation
DeltaTimeTravelSpec by Path¶
timeTravelByPath: Option[DeltaTimeTravelSpec]
Scala lazy value
timeTravelByPath
is a Scala lazy value and is initialized once when first accessed. Once computed it stays unchanged.
timeTravelByPath
is undefined when CatalogTable is defined.
With no CatalogTable defined, DeltaTableV2
parses the given Path for the timeTravelByPath
(that resolvePath under the covers).
Converting to Insertable HadoopFsRelation¶
toBaseRelation: BaseRelation
toBaseRelation
verifyAndCreatePartitionFilters for the Path, the current Snapshot and partitionFilters.
In the end, toBaseRelation
requests the DeltaLog for an insertable HadoopFsRelation.
toBaseRelation
is used when:
DeltaDataSource
is requested to create a relation (for a table scan)DeltaRelation
is requested to fromV2Relation