spark.sql.execution Configuration Properties¶

arrow.maxRecordsPerBatch¶

spark.sql.execution.arrow.maxRecordsPerBatch

When using Apache Arrow, the maximum number of records that can be written to a single ArrowRecordBatch in memory.

If zero or negative there is no limit.

Default: 10000

Used when:

ApplyInPandasWithStatePythonRunner is requested for workerConf
ArrowEvalPythonExec is created
Dataset is requested to toArrowBatchRdd
MapInBatchExec is created
SparkConnectPlanner is requested to handleSqlCommand
SparkConnectStreamHandler is requested to processAsArrowBatches

arrow.pyspark.enabled¶

spark.sql.execution.arrow.pyspark.enabled

Enables Arrow Optimization

Default: false

pandas.udf.buffer.size¶

spark.sql.execution.pandas.udf.buffer.size

spark.buffer.size for Pandas UDF executions

Note that Pandas execution requires more than 4 bytes. Lowering this value could make small Pandas UDF batch iterated and pipelined; however, it might degrade performance. See SPARK-27870.

Default: spark.buffer.size (Spark Core)

Used when:

ApplyInPandasWithStatePythonRunner and ArrowPythonRunner are created (and initialize bufferSize)

pyspark.udf.simplifiedTraceback.enabled¶

spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled

Controls the traceback from Python UDFs. When enabled (true), traceback is simplified and hides the Python worker, (de)serialization, etc. from PySpark in tracebacks, and only shows the exception messages from UDFs.

Works only with CPython 3.7+

Default: true

Used when:

ApplyInPandasWithStatePythonRunner, ArrowPythonRunner, CoGroupedArrowPythonRunner, PythonUDFRunner are created (and initialize simplifiedTraceback flag)