ExternalAppendOnlyUnsafeRowArray¶
ExternalAppendOnlyUnsafeRowArray
is an append-only array of UnsafeRows.
ExternalAppendOnlyUnsafeRowArray
keeps rows in memory until the spill threshold is reached that triggers disk spilling.
Creating Instance¶
ExternalAppendOnlyUnsafeRowArray
takes the following to be created:
-
TaskMemoryManager
(Apache Spark) -
BlockManager
(Apache Spark) -
SerializerManager
(Apache Spark) -
TaskContext
(Apache Spark) - Initial size (default:
1024
) - Page size (in bytes)
- numRowsInMemoryBufferThreshold
- numRowsSpillThreshold
ExternalAppendOnlyUnsafeRowArray
is created when:
SortMergeJoinScanner
is requested for bufferedMatchesUnsafeCartesianRDD
is requested tocompute
UpdatingSessionsIterator
is requested tostartNewSession
WindowExec
physical operator is requested to doExecute (and creates an internal buffer for window frames)WindowInPandasExec
(PySpark) is requested todoExecute
numRowsInMemoryBufferThreshold¶
numRowsInMemoryBufferThreshold
is used for the following:
numRowsSpillThreshold¶
numRowsSpillThreshold
is used for the following:
- Create an UnsafeExternalSorter (after the numRowsInMemoryBufferThreshold is reached)
numRows Counter¶
ExternalAppendOnlyUnsafeRowArray
uses numRows
internal counter for the number of rows added.
length¶
length: Int
length
returns the numRows.
length
is used when:
SortMergeJoinExec
physical operator is requested to doExecute (forLeftSemi
,LeftAnti
andExistenceJoin
joins)FrameLessOffsetWindowFunctionFrame
is requested todoWrite
OffsetWindowFunctionFrameBase
is requested tofindNextRowWithNonNullInput
SlidingWindowFunctionFrame
is requested towrite
UnboundedFollowingWindowFunctionFrame
is requested towrite
andcurrentUpperBound
UnboundedOffsetWindowFunctionFrame
is requested toprepare
UnboundedPrecedingWindowFunctionFrame
is requested toprepare
UnboundedWindowFunctionFrame
is requested toprepare
inMemoryBuffer¶
ExternalAppendOnlyUnsafeRowArray
creates an inMemoryBuffer
internal array of UnsafeRows when created and the initialSizeOfInMemoryBuffer is greater than 0
.
A new UnsafeRow
can be added to inMemoryBuffer
in add (up to the numRowsInMemoryBufferThreshold).
inMemoryBuffer
is cleared in clear or add (when the number of UnsafeRow
s is above the numRowsInMemoryBufferThreshold).
initialSizeOfInMemoryBuffer¶
ExternalAppendOnlyUnsafeRowArray
uses initialSizeOfInMemoryBuffer
internal value as the number of UnsafeRows in the inMemoryBuffer.
initialSizeOfInMemoryBuffer
is at most 128
and can be configured using thenumRowsInMemoryBufferThreshold (if smaller).
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray=ALL
Refer to Logging