spark.sql.execution Configuration Properties¶
arrow.maxRecordsPerBatch¶
spark.sql.execution.arrow.maxRecordsPerBatch
When using Apache Arrow, the maximum number of records that can be written to a single ArrowRecordBatch in memory.
If zero or negative there is no limit.
Default: 10000
Used when:
ApplyInPandasWithStatePythonRunneris requested forworkerConfArrowEvalPythonExecis createdDatasetis requested totoArrowBatchRddMapInBatchExecis createdSparkConnectPlanneris requested tohandleSqlCommandSparkConnectStreamHandleris requested toprocessAsArrowBatches
arrow.pyspark.enabled¶
spark.sql.execution.arrow.pyspark.enabled
Enables Arrow Optimization
Default: false
pandas.udf.buffer.size¶
spark.sql.execution.pandas.udf.buffer.size
spark.buffer.size for Pandas UDF executions
Note that Pandas execution requires more than 4 bytes. Lowering this value could make small Pandas UDF batch iterated and pipelined; however, it might degrade performance. See SPARK-27870.
Default: spark.buffer.size (Spark Core)
Used when:
ApplyInPandasWithStatePythonRunnerand ArrowPythonRunner are created (and initialize bufferSize)
pyspark.udf.simplifiedTraceback.enabled¶
spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled
Controls the traceback from Python UDFs. When enabled (true), traceback is simplified and hides the Python worker, (de)serialization, etc. from PySpark in tracebacks, and only shows the exception messages from UDFs.
Works only with CPython 3.7+
Default: true
Used when:
ApplyInPandasWithStatePythonRunner, ArrowPythonRunner,CoGroupedArrowPythonRunner, PythonUDFRunner are created (and initialize simplifiedTraceback flag)