spark.sql.execution Configuration Properties¶
arrow.maxRecordsPerBatch¶
spark.sql.execution.arrow.maxRecordsPerBatch
When using Apache Arrow, the maximum number of records that can be written to a single ArrowRecordBatch
in memory.
If zero or negative there is no limit.
Default: 10000
Used when:
ApplyInPandasWithStatePythonRunner
is requested forworkerConf
ArrowEvalPythonExec
is createdDataset
is requested totoArrowBatchRdd
MapInBatchExec
is createdSparkConnectPlanner
is requested tohandleSqlCommand
SparkConnectStreamHandler
is requested toprocessAsArrowBatches
arrow.pyspark.enabled¶
spark.sql.execution.arrow.pyspark.enabled
Enables Arrow Optimization
Default: false
pandas.udf.buffer.size¶
spark.sql.execution.pandas.udf.buffer.size
spark.buffer.size
for Pandas UDF executions
Note that Pandas execution requires more than 4 bytes. Lowering this value could make small Pandas UDF batch iterated and pipelined; however, it might degrade performance. See SPARK-27870.
Default: spark.buffer.size
(Spark Core)
Used when:
ApplyInPandasWithStatePythonRunner
and ArrowPythonRunner are created (and initialize bufferSize)
pyspark.udf.simplifiedTraceback.enabled¶
spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled
Controls the traceback from Python UDFs. When enabled (true
), traceback is simplified and hides the Python worker, (de)serialization, etc. from PySpark in tracebacks, and only shows the exception messages from UDFs.
Works only with CPython 3.7+
Default: true
Used when:
ApplyInPandasWithStatePythonRunner
, ArrowPythonRunner,CoGroupedArrowPythonRunner
, PythonUDFRunner are created (and initialize simplifiedTraceback flag)