HiveUtils is an utility that is used to create a <> that HiveExternalCatalog uses to interact with a Hive metastore.

HiveUtils is a Scala object with private[spark] access modifier. Use the following utility to access the properties.

// Use :paste -raw to paste the following code in spark-shell // BEGIN package org.apache.spark import org.apache.spark.sql.hive.HiveUtils object opener { def CONVERT_METASTORE_PARQUET = HiveUtils.CONVERT_METASTORE_PARQUET } // END

import org.apache.spark.opener spark.sessionState.conf.getConf(opener.CONVERT_METASTORE_PARQUET)

[[logging]] [TIP] ==== Enable ALL logging level for org.apache.spark.sql.hive.HiveUtils$ logger to see what happens inside.

=== [[builtinHiveVersion]] builtinHiveVersion Property

builtinHiveVersion: String = "1.2.1"

builtinHiveVersion is used when:

=== [[newClientForMetadata]] Creating HiveClientImpl -- newClientForMetadata Method

newClientForMetadata( conf: SparkConf, hadoopConf: Configuration): HiveClient // <1> newClientForMetadata( conf: SparkConf, hadoopConf: Configuration, configurations: Map[String, String]): HiveClient

<1> Uses time configurations formatted

Internally, newClientForMetadata creates a new SQLConf with spark.sql properties only (from the input SparkConf).

newClientForMetadata then creates an IsolatedClientLoader per the input parameters and the following configuration properties:

You should see one of the following INFO messages in the logs:

Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using Spark classes.
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using maven.
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using [jars]

In the end, newClientForMetadata requests the IsolatedClientLoader for a[HiveClient].

newClientForMetadata is used when HiveExternalCatalog is requested for a HiveClient.

=== [[newClientForExecution]] newClientForExecution Utility

newClientForExecution( conf: SparkConf, hadoopConf: Configuration): HiveClientImpl


newClientForExecution is used for HiveThriftServer2.

=== [[inferSchema]] inferSchema Method

inferSchema( table: CatalogTable): CatalogTable


NOTE: inferSchema is used when[ResolveHiveSerdeTable] logical resolution rule is executed.

=== [[withHiveExternalCatalog]] withHiveExternalCatalog Utility

withHiveExternalCatalog( sc: SparkContext): SparkContext

withHiveExternalCatalog simply sets the ../[spark.sql.catalogImplementation] configuration property to hive for the input SparkContext.

NOTE: withHiveExternalCatalog is used when the deprecated HiveContext is created.