BytesToBytesMap

BytesToBytesMap is a MemoryConsumer.

BytesToBytesMap is used to create Spark SQL’s UnsafeKVExternalSorter and UnsafeHashedRelation.

Creating Instance

BytesToBytesMap takes the following to be created:

BytesToBytesMap is created for Spark SQL’s UnsafeFixedWidthAggregationMap and UnsafeHashedRelation.

SerializerManager

BytesToBytesMap is given a SerializerManager when created.

BytesToBytesMap uses the SerializerManager when (MapIterator is) requested to advanceToNextPage (to request UnsafeSorterSpillWriter for a UnsafeSorterSpillReader).

Maximum Supported Capacity

BytesToBytesMap supports up to 1 << 29 keys.

UnsafeSorterSpillWriters

BytesToBytesMap manages UnsafeSorterSpillWriters.

BytesToBytesMap registers a new UnsafeSorterSpillWriter when requested to spill.

BytesToBytesMap uses the UnsafeSorterSpillWriters when:

MapIterator

BytesToBytesMap manages a "destructive" MapIterator.

BytesToBytesMap creates it when requested for one.

BytesToBytesMap requests it to spill when requested to spill.

Creating Destructive MapIterator

MapIterator destructiveIterator()

destructiveIterator updatePeakMemoryUsed and creates a MapIterator (with the numValues and Location).

destructiveIterator is used when Spark SQL’s UnsafeFixedWidthAggregationMap is requested for a key-value iterator.

Allocating

void allocate(
  int capacity)

allocate uses the input capacity to compute a number that is a power of 2 and greater or equal than capacity, but not greater than maximum supported capacity. The computed number is at least 64.

def _c(capacity: Int) = {
  import org.apache.spark.unsafe.array.ByteArrayMethods
  val MAX_CAPACITY = (1 << 29)
  Math.max(Math.min(MAX_CAPACITY, ByteArrayMethods.nextPowerOf2(capacity)), 64)
}

allocate allocates an array twice as big as the power-of-two capacity and fills it all with 0s.

allocate initializes the growthThreshold and mask internal properties.

allocate requires that the input capacity is positive.

allocate is used when…​FIXME

Spilling

long spill(
  long size,
  MemoryConsumer trigger)
spill is part of the MemoryConsumer abstraction.

spill requests the MapIterator to spill when the given MemoryConsumer is not this BytesToBytesMap and the MapIterator is available.

Freeing Up Allocated Memory

void free()

free…​FIXME

free is used when…​FIXME

Internal Properties

Name Description

growthThreshold

Growth threshold

mask

Mask for truncating hashcodes so that they do not exceed the long array’s size