Use DescribeDeltaDetailCommand to review the metadata of a delta table.
Metadata takes the following to be created:
- Id (default: random UUID)
- Name (default:
- Description (default:
- Format (default: empty)
- Schema (default:
- Partition Columns (default:
- Table Configuration (default:
- Created Time (default: current time)
Metadata is created when:
DeltaLogis requested for the metadata (but that should be rare)
- ConvertToDeltaCommand is executed
ImplicitMetadataOperationis requested to updateMetadata
val path = "/tmp/delta/users" import org.apache.spark.sql.delta.DeltaLog val deltaLog = DeltaLog.forTable(spark, path) import org.apache.spark.sql.delta.actions.Metadata assert(deltaLog.snapshot.metadata.isInstanceOf[Metadata]) deltaLog.snapshot.metadata.id
Metadata uses a Table ID (aka reservoirId) to uniquely identify a delta table and is never going to change through the history of the table.
When I asked the question tableId and reservoirId - Why two different names for metadata ID? on delta-users mailing list, Tathagata Das wrote:
Any reference to "reservoir" is just legacy code. In the early days of this project, the project was called "Tahoe" and each table is called a "reservoir" (Tahoe is one of the 2nd deepest lake in US, and is a very large reservoir of water ;) ). So you may still find those two terms all around the codebase.
In some cases, like DeltaSourceOffset, the term
reservoirIdis in the json that is written to the streaming checkpoint directory. So we cannot change that for backward compatibility.