Skip to content

spark.sql.execution Configuration Properties

arrow.maxRecordsPerBatch

spark.sql.execution.arrow.maxRecordsPerBatch

When using Apache Arrow, the maximum number of records that can be written to a single ArrowRecordBatch in memory.

If zero or negative there is no limit.

Default: 10000

Used when:

  • ApplyInPandasWithStatePythonRunner is requested for workerConf
  • ArrowEvalPythonExec is created
  • Dataset is requested to toArrowBatchRdd
  • MapInBatchExec is created
  • SparkConnectPlanner is requested to handleSqlCommand
  • SparkConnectStreamHandler is requested to processAsArrowBatches

arrow.pyspark.enabled

spark.sql.execution.arrow.pyspark.enabled

Enables Arrow Optimization

Default: false

pandas.udf.buffer.size

spark.sql.execution.pandas.udf.buffer.size

spark.buffer.size for Pandas UDF executions

Note that Pandas execution requires more than 4 bytes. Lowering this value could make small Pandas UDF batch iterated and pipelined; however, it might degrade performance. See SPARK-27870.

Default: spark.buffer.size (Spark Core)

Used when:

pyspark.udf.simplifiedTraceback.enabled

spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled

Controls the traceback from Python UDFs. When enabled (true), traceback is simplified and hides the Python worker, (de)serialization, etc. from PySpark in tracebacks, and only shows the exception messages from UDFs.

Works only with CPython 3.7+

Default: true

Used when: