Skip to content

Configuration Properties

spark.sql.pipelines is a family of the Configuration Properties for Spark Declarative Pipelines.

execution.streamstate.pollingInterval

spark.sql.pipelines.execution.streamstate.pollingInterval

(internal) How often (in seconds) the stream state is polled for changes. This is used to check if the stream has failed and needs to be restarted.

Default: 1

Use SQLConf.PIPELINES_STREAM_STATE_POLLING_INTERVAL to reference the name.

Use SQLConf.streamStatePollingInterval method to access the current value.

Used when:

execution.watchdog.minRetryTime

spark.sql.pipelines.execution.watchdog.minRetryTime

(internal) Initial duration (in seconds) between the time when we notice a flow has failed and when we try to restart the flow. The interval between flow restarts doubles with every stream failure up to the maximum value set in spark.sql.pipelines.execution.watchdog.maxRetryTime.

Default: 5 (seconds)

Must be at least 1 second

Use SQLConf.PIPELINES_WATCHDOG_MIN_RETRY_TIME_IN_SECONDS to reference the name.

Use SQLConf.watchdogMinRetryTimeInSeconds method to access the current value.

Used when:

execution.watchdog.maxRetryTime

spark.sql.pipelines.execution.watchdog.maxRetryTime

(internal) Maximum time interval (in seconds) at which flows will be restarted

Default: 3600 (seconds)

Must be greater than or equal to spark.sql.pipelines.execution.watchdog.minRetryTime

Use SQLConf.PIPELINES_WATCHDOG_MAX_RETRY_TIME_IN_SECONDS to reference the name.

Use SQLConf.watchdogMaxRetryTimeInSeconds method to access the current value.

Used when:

execution.maxConcurrentFlows

spark.sql.pipelines.execution.maxConcurrentFlows

(internal) Maximum number of flows to execute at once. Used to tune performance for triggered pipelines. Has no effect on continuous pipelines.

Default: 16

Use SQLConf.PIPELINES_MAX_CONCURRENT_FLOWS to reference the name.

Use SQLConf.maxConcurrentFlows method to access the current value.

Used when:

timeoutMsForTerminationJoinAndLock

spark.sql.pipelines.timeoutMsForTerminationJoinAndLock

(internal) Timeout (in ms) to grab a lock for stopping update - default is 1hr.

Default: 60 * 60 * 1000 (1 hour)

Must be at least 1 millisecond

Use SQLConf.PIPELINES_TIMEOUT_MS_FOR_TERMINATION_JOIN_AND_LOCK to reference the name.

Use SQLConf.timeoutMsForTerminationJoinAndLock method to access the current value.

Used when:

maxFlowRetryAttempts

spark.sql.pipelines.maxFlowRetryAttempts

Maximum number of times a flow can be retried. Can be set at the pipeline or flow level

Default: 2

Use SQLConf.PIPELINES_MAX_FLOW_RETRY_ATTEMPTS to reference the name.

Use SQLConf.maxFlowRetryAttempts method to access the current value.

Used when:

event.queue.capacity

spark.sql.pipelines.event.queue.capacity

(internal) Capacity of the event queue used in pipelined execution. When the queue is full, non-terminal FlowProgressEvents will be dropped.

Default: 1000

Must be positive

Use SQLConf.PIPELINES_EVENT_QUEUE_CAPACITY to reference the name.

Used when: