ExternalAppendOnlyUnsafeRowArray¶
ExternalAppendOnlyUnsafeRowArray is an append-only array of UnsafeRows.
ExternalAppendOnlyUnsafeRowArray keeps rows in memory until the spill threshold is reached that triggers disk spilling.
Creating Instance¶
ExternalAppendOnlyUnsafeRowArray takes the following to be created:
-
TaskMemoryManager(Apache Spark) -
BlockManager(Apache Spark) -
SerializerManager(Apache Spark) -
TaskContext(Apache Spark) - Initial size (default:
1024) - Page size (in bytes)
- numRowsInMemoryBufferThreshold
- numRowsSpillThreshold
ExternalAppendOnlyUnsafeRowArray is created when:
SortMergeJoinScanneris requested for bufferedMatchesUnsafeCartesianRDDis requested tocomputeUpdatingSessionsIteratoris requested tostartNewSessionWindowExecphysical operator is requested to doExecute (and creates an internal buffer for window frames)WindowInPandasExec(PySpark) is requested todoExecute
numRowsInMemoryBufferThreshold¶
numRowsInMemoryBufferThreshold is used for the following:
numRowsSpillThreshold¶
numRowsSpillThreshold is used for the following:
- Create an UnsafeExternalSorter (after the numRowsInMemoryBufferThreshold is reached)
numRows Counter¶
ExternalAppendOnlyUnsafeRowArray uses numRows internal counter for the number of rows added.
length¶
length: Int
length returns the numRows.
length is used when:
SortMergeJoinExecphysical operator is requested to doExecute (forLeftSemi,LeftAntiandExistenceJoinjoins)FrameLessOffsetWindowFunctionFrameis requested todoWriteOffsetWindowFunctionFrameBaseis requested tofindNextRowWithNonNullInputSlidingWindowFunctionFrameis requested towriteUnboundedFollowingWindowFunctionFrameis requested towriteandcurrentUpperBoundUnboundedOffsetWindowFunctionFrameis requested toprepareUnboundedPrecedingWindowFunctionFrameis requested toprepareUnboundedWindowFunctionFrameis requested toprepare
inMemoryBuffer¶
ExternalAppendOnlyUnsafeRowArray creates an inMemoryBuffer internal array of UnsafeRows when created and the initialSizeOfInMemoryBuffer is greater than 0.
A new UnsafeRow can be added to inMemoryBuffer in add (up to the numRowsInMemoryBufferThreshold).
inMemoryBuffer is cleared in clear or add (when the number of UnsafeRows is above the numRowsInMemoryBufferThreshold).
initialSizeOfInMemoryBuffer¶
ExternalAppendOnlyUnsafeRowArray uses initialSizeOfInMemoryBuffer internal value as the number of UnsafeRows in the inMemoryBuffer.
initialSizeOfInMemoryBuffer is at most 128 and can be configured using thenumRowsInMemoryBufferThreshold (if smaller).
Logging¶
Enable ALL logging level for org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray logger to see what happens inside.
Add the following line to conf/log4j2.properties:
log4j.logger.org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray=ALL
Refer to Logging