Skip to content

ExternalAppendOnlyUnsafeRowArray

ExternalAppendOnlyUnsafeRowArray is an append-only array of UnsafeRows.

ExternalAppendOnlyUnsafeRowArray keeps rows in memory until the spill threshold is reached that triggers disk spilling.

Creating Instance

ExternalAppendOnlyUnsafeRowArray takes the following to be created:

ExternalAppendOnlyUnsafeRowArray is created when:

  • SortMergeJoinScanner is requested for bufferedMatches
  • UnsafeCartesianRDD is requested to compute
  • UpdatingSessionsIterator is requested to startNewSession
  • WindowExec physical operator is requested to doExecute (and creates an internal buffer for window frames)
  • WindowInPandasExec (PySpark) is requested to doExecute

numRowsInMemoryBufferThreshold

numRowsInMemoryBufferThreshold is used for the following:

numRowsSpillThreshold

numRowsSpillThreshold is used for the following:

numRows Counter

ExternalAppendOnlyUnsafeRowArray uses numRows internal counter for the number of rows added.

length

length: Int

length returns the numRows.

length is used when:

  • SortMergeJoinExec physical operator is requested to doExecute (for LeftSemi, LeftAnti and ExistenceJoin joins)
  • FrameLessOffsetWindowFunctionFrame is requested to doWrite
  • OffsetWindowFunctionFrameBase is requested to findNextRowWithNonNullInput
  • SlidingWindowFunctionFrame is requested to write
  • UnboundedFollowingWindowFunctionFrame is requested to write and currentUpperBound
  • UnboundedOffsetWindowFunctionFrame is requested to prepare
  • UnboundedPrecedingWindowFunctionFrame is requested to prepare
  • UnboundedWindowFunctionFrame is requested to prepare

inMemoryBuffer

ExternalAppendOnlyUnsafeRowArray creates an inMemoryBuffer internal array of UnsafeRows when created and the initialSizeOfInMemoryBuffer is greater than 0.

A new UnsafeRow can be added to inMemoryBuffer in add (up to the numRowsInMemoryBufferThreshold).

inMemoryBuffer is cleared in clear or add (when the number of UnsafeRows is above the numRowsInMemoryBufferThreshold).

initialSizeOfInMemoryBuffer

ExternalAppendOnlyUnsafeRowArray uses initialSizeOfInMemoryBuffer internal value as the number of UnsafeRows in the inMemoryBuffer.

initialSizeOfInMemoryBuffer is at most 128 and can be configured using thenumRowsInMemoryBufferThreshold (if smaller).

Logging

Enable ALL logging level for org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray=ALL

Refer to Logging