Metadata

Metadata is an action that describes metadata (change) of a delta table (indirectly via Snapshot).

import org.apache.spark.sql.delta.DeltaLog
val deltaLog = DeltaLog.forTable(spark, "/tmp/delta/users")
scala> :type deltaLog.snapshot.metadata
org.apache.spark.sql.delta.actions.Metadata

Metadata contains all the non-data information (metadata) like name, description, format, schema, partition columns, table properties and created time. These can be changed (e.g., schema evolution).

Use DescribeDeltaDetailCommand to review the metadata of a delta table.

Metadata uses id to uniquely identify a delta table. The ID is never going to change through the history of the table (unless the entire directory, along with the transaction log is deleted). It is known as tableId or reservoirId.

When I asked the question tableId and reservoirId - Why two different names for metadata ID? on delta-users mailing list, Tathagata Das wrote:

Any reference to "reservoir" is just legacy code. In the early days of this project, the project was called "Tahoe" and each table is called a "reservoir" (Tahoe is one of the 2nd deepest lake in US, and is a very large reservoir of water ;) ). So you may still find those two terms all around the codebase.

In some cases, like DeltaSourceOffset, the term reservoirId is in the json that is written to the streaming checkpoint directory. So we cannot change that for backward compatibility.

Metadata can be updated in a transaction once (and only when created for an uninitialized table, when readVersion is -1).

txn.metadata

Metadata is created when:

Creating Metadata Instance

Metadata takes the following to be created:

  • Table ID (default: a random UUID)

  • Name of the delta table (default: null)

  • Description (default: null)

  • Format

  • Schema (default: null)

  • Partition columns (default: Nil)

  • Configuration (default: empty)

  • Created time (in millis since the epoch)

wrap Method

wrap: SingleAction
wrap is part of the Action contract to wrap the action into a SingleAction.

wrap simply creates a new SingleAction with the metaData field set to this Metadata.

partitionSchema (Lazy) Property

partitionSchema: StructType

partitionSchema is the partition columns as StructFields (and defined in the schema).

partitionSchema throws an IllegalArgumentException for undefined fields that were used for the partition columns but not defined in the schema.
partitionSchema is used when…​FIXME

dataSchema (Lazy) Property

dataSchema: StructType

dataSchema…​FIXME

dataSchema is used when…​FIXME

schema (Lazy) Property

schema: StructType

schema is a deserialized schema (from JSON format) to StructType.

schema is used when: