Skip to content

ExternalCatalog

ExternalCatalog is an abstraction of external system catalogs (aka metadata registry or metastore) of permanent relational entities (i.e., databases, tables, partitions, and functions).

ExternalCatalog is available as ephemeral (in-memory) or persistent (hive-aware).

Contract

getPartition

getPartition(
  db: String,
  table: String,
  spec: TablePartitionSpec): CatalogTablePartition

CatalogTablePartition of a given table (in a database)

See:

Used when:

  • ExternalCatalogWithListener is requested to getPartition
  • SessionCatalog is requested to getPartition

getPartitionOption

getPartitionOption(
  db: String,
  table: String,
  spec: TablePartitionSpec): Option[CatalogTablePartition]

CatalogTablePartition of a given table (in a database)

See:

Used when:

  • ExternalCatalogWithListener is requested to getPartitionOption
  • InsertIntoHiveTable is requested to processInsert

getTable

getTable(
  db: String,
  table: String): CatalogTable

CatalogTable of a given table (in a database)

See:

Used when:

getTablesByName

getTablesByName(
  db: String,
  tables: Seq[String]): Seq[CatalogTable]

CatalogTables of the given tables (in a database)

See:

Used when:

  • ExternalCatalogWithListener is requested to getTablesByName
  • SessionCatalog is requested to getTablesByName

listPartitionsByFilter

listPartitionsByFilter(
  db: String,
  table: String,
  predicates: Seq[Expression],
  defaultTimeZoneId: String): Seq[CatalogTablePartition]

CatalogTablePartitions

See:

Used when:

  • ExternalCatalogWithListener is requested to getTablesByName
  • SessionCatalog is requested to listPartitionsByFilter

Implementations

Accessing ExternalCatalog

ExternalCatalog is available as externalCatalog of SharedState (in SparkSession).

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sharedState.externalCatalog
org.apache.spark.sql.catalyst.catalog.ExternalCatalog