Skip to content

CatalogTable

CatalogTable is the specification (metadata) of a table in a SessionCatalog.

Creating Instance

CatalogTable takes the following to be created:

  • TableIdentifier
  • Table type
  • CatalogStorageFormat
  • Schema (StructType)
  • Name of the table provider
  • Partition Columns
  • Bucketing specification
  • Owner
  • Created Time
  • Last access time
  • Created By version
  • Table Properties
  • Statistics
  • View Text
  • Comment
  • Unsupported Features (Seq[String])
  • tracksPartitionsInCatalog flag (default: false)
  • schemaPreservesCase flag (default: true)
  • Ignored properties
  • View Original Text

CatalogTable is created when:

Bucketing Specification

bucketSpec: Option[BucketSpec] = None

CatalogTable can be given a BucketSpec when created. It is undefined (None) by default.

BucketSpec is given (using getBucketSpecFromTableProperties from a Hive metastore) when:

BucketSpec is given when:

BucketSpec is used when:

Note

  1. Use DescribeTableCommand to review BucketSpec
  2. Use ShowCreateTableCommand to review the Spark DDL syntax
  3. Use Catalog.listColumns to list all columns (incl. bucketing columns)

Table Type

CatalogTable is given a CatalogTableType when created:

CatalogTableType is included when a TreeNode is requested for a JSON representation for...FIXME

Statistics

stats: Option[CatalogStatistics] = None

CatalogTable can be given a CatalogStatistics when created. It is undefined (None) by default.

CatalogTable can be displayed using the following commands (when executed with EXTENDED or FORMATTED clause):

The CatalogStatistics can be defined when:

The CatalogStatistics is used when:

The CatalogStatistics is updated (altered) when:

CatalogStatistics is Statistics in toLinkedHashMap.

toLinkedHashMap

toLinkedHashMap: LinkedHashMap[String, String]

toLinkedHashMap...FIXME


toLinkedHashMap is used when:

Demo: Accessing Table Metadata

Catalog

val q = spark.catalog.listTables.filter($"name" === "t1")
scala> q.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
|  t1| default|       null|  MANAGED|      false|
+----+--------+-----------+---------+-----------+

SessionCatalog

import org.apache.spark.sql.catalyst.catalog.SessionCatalog
val sessionCatalog = spark.sessionState.catalog
assert(sessionCatalog.isInstanceOf[SessionCatalog])
val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = sessionCatalog.getTempViewOrPermanentTableMetadata(t1Tid)
import org.apache.spark.sql.catalyst.catalog.CatalogTable
assert(t1Metadata.isInstanceOf[CatalogTable])