StaticSQLConf — Static Configuration Properties¶
StaticSQLConf
holds cross-session, immutable and static SQL configuration properties.
assert(sc.isInstanceOf[org.apache.spark.SparkContext])
import org.apache.spark.sql.internal.StaticSQLConf
sc.getConf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS.key)
StaticSQLConf
configuration properties can only be queried and can never be changed once the first SparkSession
is created (unlike the regular configuration properties).
import org.apache.spark.sql.internal.StaticSQLConf
scala> val metastoreName = spark.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION.key)
metastoreName: String = hive
scala> spark.conf.set(StaticSQLConf.CATALOG_IMPLEMENTATION.key, "hive")
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.catalogImplementation;
at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:144)
at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:41)
... 50 elided
cache.serializer¶
codegen.cache.maxEntries¶
spark.sql.codegen.cache.maxEntries
(internal) When non-zero, enable caching of generated classes for operators and expressions. All jobs share the cache that can use up to the specified number for generated classes.
Default: 100
Use SQLConf.codegenCacheMaxEntries to access the current value
Used when:
CodeGenerator
is loaded (and creates the cache)
spark.sql.broadcastExchange.maxThreadThreshold¶
(internal) The maximum degree of parallelism to fetch and broadcast the table. If we encounter memory issue like frequently full GC or OOM when broadcast table we can decrease this number in order to reduce memory usage. Notice the number should be carefully chosen since decreasing parallelism might cause longer waiting for other broadcasting. Also, increasing parallelism may cause memory problem.
The threshold must be in (0,128]
Default: 128
spark.sql.catalogImplementation¶
(internal) Configures in-memory
(default) or hive
-related BaseSessionStateBuilder and ExternalCatalog
Builder.enableHiveSupport is used to enable Hive support for a SparkSession.
Used when:
-
SparkSession
utility is requested for the name of the BaseSessionStateBuilder implementation (whenSparkSession
is requested for a SessionState) -
SharedState
utility is requested for the name of the ExternalCatalog implementation (whenSharedState
is requested for an ExternalCatalog) -
SparkSession.Builder
is requested to enable Hive support -
spark-shell
is executed -
SetCommand
is executed (withhive.
keys)
spark.sql.debug¶
(internal) Only used for internal debugging when HiveExternalCatalog
is requested to restoreTableMetadata.
Default: false
Not all functions are supported when enabled.
spark.sql.defaultUrlStreamHandlerFactory.enabled¶
(internal) When true, register Hadoop's FsUrlStreamHandlerFactory to support ADD JAR against HDFS locations. It should be disabled when a different stream protocol handler should be registered to support a particular protocol type, or if Hadoop's FsUrlStreamHandlerFactory conflicts with other protocol types such as http
or https
. See also SPARK-25694 and HADOOP-14598.
Default: true
spark.sql.event.truncate.length¶
Threshold of SQL length beyond which it will be truncated before adding to event. Defaults to no truncation. If set to 0, callsite will be logged instead.
Must be set greater or equal to zero.
Default: Int.MaxValue
spark.sql.extensions¶
A comma-separated list of SQL extension configuration classes to configure SparkSessionExtensions:
- The classes must implement
SparkSessionExtensions => Unit
- The classes must have a no-args constructor
- If multiple extensions are specified, they are applied in the specified order.
- For the case of rules and planner strategies, they are applied in the specified order.
- For the case of parsers, the last parser is used and each parser can delegate to its predecessor
- For the case of function name conflicts, the last registered function name is used
Default: (empty)
Used when:
SparkSession
utility is used to apply SparkSessionExtensions
spark.sql.filesourceTableRelationCacheSize¶
(internal) The maximum size of the cache that maps qualified table names to table relation plans. Must not be negative.
Default: 1000
spark.sql.globalTempDatabase¶
(internal) Name of the Spark-owned internal database of global temporary views
Default: global_temp
The name of the internal database cannot conflict with the names of any database that is already available in ExternalCatalog.
Used to create a GlobalTempViewManager when SharedState
is first requested for one.
spark.sql.hive.thriftServer.singleSession¶
When enabled (true
), Hive Thrift server is running in a single session mode. All the JDBC/ODBC connections share the temporary views, function registries, SQL configuration and the current database.
Default: false
spark.sql.legacy.sessionInitWithConfigDefaults¶
Flag to revert to legacy behavior where a cloned SparkSession receives SparkConf defaults, dropping any overrides in its parent SparkSession.
Default: false
spark.sql.queryExecutionListeners¶
Class names of QueryExecutionListeners that will be automatically registered (with new SparkSessions)
Default: (empty)
The classes should have either a no-arg constructor, or a constructor that expects a SparkConf
argument.
spark.sql.sources.schemaStringLengthThreshold¶
(internal) The maximum length allowed in a single cell when storing additional schema information in Hive's metastore
Default: 4000
spark.sql.streaming.ui.enabled¶
Whether to run the Structured Streaming Web UI for the Spark application when the Spark Web UI is enabled.
Default: true
spark.sql.streaming.ui.retainedProgressUpdates¶
The number of progress updates to retain for a streaming query for Structured Streaming UI.
Default: 100
spark.sql.streaming.ui.retainedQueries¶
The number of inactive queries to retain for Structured Streaming UI.
Default: 100
spark.sql.ui.retainedExecutions¶
Number of executions to retain in the Spark UI.
Default: 1000
spark.sql.warehouse.dir¶
Directory of a Spark warehouse
Default: spark-warehouse