Metadata¶
Metadata is an <
import org.apache.spark.sql.delta.DeltaLog
val deltaLog = DeltaLog.forTable(spark, "/tmp/delta/users")
scala> :type deltaLog.snapshot.metadata
org.apache.spark.sql.delta.actions.Metadata
Metadata contains all the non-data information (metadata) like <
TIP: Use <
Metadata uses <
[NOTE]¶
When I asked the question https://groups.google.com/forum/#!topic/delta-users/5OKEFvVKiew[tableId and reservoirId - Why two different names for metadata ID?] on delta-users mailing list, Tathagata Das wrote:
Any reference to "reservoir" is just legacy code. In the early days of this project, the project was called "Tahoe" and each table is called a "reservoir" (Tahoe is one of the 2nd deepest lake in US, and is a very large reservoir of water ;) ). So you may still find those two terms all around the codebase.
In some cases, like DeltaSourceOffset, the term
reservoirId
is in the json that is written to the streaming checkpoint directory. So we cannot change that for backward compatibility.
====
Metadata can be <-1
).
[source,scala]¶
txn.metadata¶
Metadata is <
-
DeltaLog
is requested for the <> -
OptimisticTransactionImpl
is requested for the <> -
ConvertToDeltaCommand
is requested to <> -
ImplicitMetadataOperation
is requested to <>
== [[creating-instance]] Creating Metadata Instance
Metadata takes the following to be created:
- [[id]] Table ID (default: a random UUID)
- [[name]] Name of the delta table (default:
null
) - [[description]] Description (default:
null
) - [[format]]
Format
- [[schemaString]] Schema (default:
null
) - [[partitionColumns]] Partition columns (default:
Nil
) - [[configuration]] Configuration (default:
empty
) - [[createdTime]] Created time (in millis since the epoch)
== [[wrap]] wrap
Method
[source, scala]¶
wrap: SingleAction¶
NOTE: wrap
is part of the <
wrap
simply creates a new <
== [[partitionSchema]] partitionSchema
(Lazy) Property
[source, scala]¶
partitionSchema: StructType¶
partitionSchema
is the <StructFields
(and defined in the <
NOTE: partitionSchema
throws an IllegalArgumentException
for undefined fields that were used for the <
NOTE: partitionSchema
is used when...FIXME
== [[dataSchema]] dataSchema
(Lazy) Property
[source, scala]¶
dataSchema: StructType¶
dataSchema
...FIXME
NOTE: dataSchema
is used when...FIXME
== [[schema]] schema
(Lazy) Property
[source, scala]¶
schema: StructType¶
schema
is a deserialized <StructType
.
[NOTE]¶
schema
is used when:
-
Metadata is requested for the schema of the <
> and the < > -
DeltaLog
is requested for an DeltaLog.md#createRelation[insertable HadoopFsRelation for batch queries] (for the data schema), to DeltaLog.md#upgradeProtocol[upgrade protocol], a DeltaLog.md#createDataFrame[DataFrame for given AddFiles] -
DeltaTableUtils
utility is used to DeltaTableUtils.md#combineWithCatalogMetadata[combineWithCatalogMetadata] -
OptimisticTransactionImpl
is requested to OptimisticTransactionImpl.md#verifyNewMetadata[verifyNewMetadata]