ExternalAppendOnlyUnsafeRowArray -- Append-Only Array for UnsafeRows (with Disk Spill Threshold)

ExternalAppendOnlyUnsafeRowArray is an append-only array for[UnsafeRows] that spills content to disk when a <> is reached.

NOTE: Choosing a proper spill threshold of rows is a performance optimization.

ExternalAppendOnlyUnsafeRowArray is created when:

  • WindowExec physical operator is[executed] (and creates an internal buffer for window frames)

  • WindowFunctionFrame is[prepared]

  • SortMergeJoinExec physical operator is[executed] (and creates a RowIterator for INNER and CROSS joins) and for getBufferedMatches

  • SortMergeJoinScanner creates an internal bufferedMatches

  • UnsafeCartesianRDD is computed

[[internal-registries]] .ExternalAppendOnlyUnsafeRowArray's Internal Registries and Counters [cols="1,2",options="header",width="100%"] |=== | Name | Description

| [[initialSizeOfInMemoryBuffer]] initialSizeOfInMemoryBuffer | FIXME

Used when...FIXME

| [[inMemoryBuffer]] inMemoryBuffer | FIXME

Can grow up to <> rows (i.e. new UnsafeRows are <>)

Used when...FIXME

| [[spillableArray]] spillableArray | UnsafeExternalSorter

Used when...FIXME

[[numRows]] numRows

Used when...FIXME

[[modificationsCount]] modificationsCount

Used when...FIXME

[[numFieldsPerRow]] numFieldsPerRow

Used when...FIXME |===


Enable INFO logging level for org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray logger to see what happens inside.

Add the following line to conf/

Refer to[Logging].

=== [[generateIterator]] generateIterator Method

[source, scala]

generateIterator(): Iterator[UnsafeRow] generateIterator(startIndex: Int): Iterator[UnsafeRow]


=== [[add]] add Method

[source, scala]

add(unsafeRow: UnsafeRow): Unit



add is used when:

  • WindowExec is executed (and[fetches all rows in a partition for a group].

  • SortMergeJoinScanner buffers matching rows

* UnsafeCartesianRDD is computed

=== [[clear]] clear Method

[source, scala]

clear(): Unit


=== [[creating-instance]] Creating ExternalAppendOnlyUnsafeRowArray Instance

ExternalAppendOnlyUnsafeRowArray takes the following when created:

  • [[taskMemoryManager]][TaskMemoryManager]
  • [[blockManager]][BlockManager]
  • [[serializerManager]][SerializerManager]
  • [[taskContext]][TaskContext]
  • [[initialSize]] Initial size
  • [[pageSizeBytes]] Page size (in bytes)
  • [[numRowsSpillThreshold]] Number of rows to hold before spilling them to disk

ExternalAppendOnlyUnsafeRowArray initializes the <>.

