HiveUtils¶
HiveUtils
is an utility that is used to create a <
HiveUtils
is a Scala object with private[spark]
access modifier. Use the following utility to access the properties.
[source, scala]¶
// Use :paste -raw to paste the following code in spark-shell // BEGIN package org.apache.spark import org.apache.spark.sql.hive.HiveUtils object opener { def CONVERT_METASTORE_PARQUET = HiveUtils.CONVERT_METASTORE_PARQUET } // END
import org.apache.spark.opener spark.sessionState.conf.getConf(opener.CONVERT_METASTORE_PARQUET)
[[logging]] [TIP] ==== Enable ALL
logging level for org.apache.spark.sql.hive.HiveUtils$
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.hive.HiveUtils=ALL
Refer to ../spark-logging.md[Logging].¶
=== [[builtinHiveVersion]] builtinHiveVersion
Property
[source, scala]¶
builtinHiveVersion: String = "1.2.1"¶
builtinHiveVersion
is used when:
- spark.sql.hive.metastore.version configuration property is used
HiveUtils
utility is used to newClientForExecution and newClientForMetadata- Spark Thrift Server is used
=== [[newClientForMetadata]] Creating HiveClientImpl -- newClientForMetadata
Method
[source, scala]¶
newClientForMetadata( conf: SparkConf, hadoopConf: Configuration): HiveClient // <1> newClientForMetadata( conf: SparkConf, hadoopConf: Configuration, configurations: Map[String, String]): HiveClient
<1> Uses time configurations formatted
Internally, newClientForMetadata
creates a new SQLConf with spark.sql properties only (from the input SparkConf
).
newClientForMetadata
then creates an IsolatedClientLoader per the input parameters and the following configuration properties:
You should see one of the following INFO messages in the logs:
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using Spark classes.
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using maven.
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using [jars]
In the end, newClientForMetadata
requests the IsolatedClientLoader
for a IsolatedClientLoader.md#createClient[HiveClient].
newClientForMetadata
is used when HiveExternalCatalog
is requested for a HiveClient.
=== [[newClientForExecution]] newClientForExecution
Utility
[source, scala]¶
newClientForExecution( conf: SparkConf, hadoopConf: Configuration): HiveClientImpl
newClientForExecution
...FIXME
newClientForExecution
is used for HiveThriftServer2.
=== [[inferSchema]] inferSchema
Method
[source, scala]¶
inferSchema( table: CatalogTable): CatalogTable
inferSchema
...FIXME
NOTE: inferSchema
is used when ResolveHiveSerdeTable.md[ResolveHiveSerdeTable] logical resolution rule is executed.
=== [[withHiveExternalCatalog]] withHiveExternalCatalog
Utility
[source, scala]¶
withHiveExternalCatalog( sc: SparkContext): SparkContext
withHiveExternalCatalog
simply sets the ../StaticSQLConf.md#spark.sql.catalogImplementation[spark.sql.catalogImplementation] configuration property to hive
for the input SparkContext
.
NOTE: withHiveExternalCatalog
is used when the deprecated HiveContext
is created.