Skip to content

DeletionVectorDescriptor

DeletionVectorDescriptor describes a deletion vector attached to a file.

Creating Instance

DeletionVectorDescriptor takes the following to be created:

  • Storage Type
  • Path or an inline deletion vector
  • Offset
  • Size (in bytes)
  • Cardinality
  • maxRowIndex

DeletionVectorDescriptor is created using the following utilities:

Storage Type

storageType: String

DeletionVectorDescriptor is given a storage type that indicates how the deletion vector is stored.

The storage types of a deletion vector can be one of the following:

Storage Type Format Description
p(ath) <absolute path> Stored in a file that is available at an absolute path
i(nline) <base85 encoded bytes> Stored inline in the transaction log
u(uid) <random prefix - optional><base85 encoded uuid> (UUID-based) Stored in a file with a path relative to the data directory of a delta table

Creating Empty Deletion Vector

EMPTY: DeletionVectorDescriptor

EMPTY is an empty deletion vector (DeletionVectorDescriptor) with the following:

Property Value
storageType i
pathOrInlineDv (empty)
sizeInBytes 0
cardinality 0

EMPTY is used when:

onDiskWithRelativePath

onDiskWithRelativePath(
  id: UUID,
  randomPrefix: String = "",
  sizeInBytes: Int,
  cardinality: Long,
  offset: Option[Int] = None,
  maxRowIndex: Option[Long] = None): DeletionVectorDescriptor

onDiskWithRelativePath creates a DeletionVectorDescriptor with the following:

Property Value
storageType u
pathOrInlineDv encodeUUID with the given id and randomPrefix
offset The given offset
sizeInBytes The given sizeInBytes
cardinality The given cardinality
maxRowIndex The given maxRowIndex

onDiskWithRelativePath is used when:

inlineInLog

inlineInLog(
  data: Array[Byte],
  cardinality: Long): DeletionVectorDescriptor

inlineInLog creates a DeletionVectorDescriptor with the following:

Property Value
storageType i
pathOrInlineDv encodeData for the given data
sizeInBytes The size of the given data
cardinality The given cardinality

inlineInLog is used when:

onDiskWithAbsolutePath

onDiskWithAbsolutePath(
  path: String,
  sizeInBytes: Int,
  cardinality: Long,
  offset: Option[Int] = None,
  maxRowIndex: Option[Long] = None): DeletionVectorDescriptor

Note

onDiskWithAbsolutePath is used for testing only.

copyWithAbsolutePath

copyWithAbsolutePath(
  tableLocation: Path): DeletionVectorDescriptor

copyWithAbsolutePath creates a new copy of this DeletionVectorDescriptor.

For uuid storage type, copyWithAbsolutePath replaces the following:

Attribute New Value
Storage type p
Path The absolute path based on the given tableLocation

copyWithAbsolutePath is used when:

Absolute Path

absolutePath(
  tableLocation: Path): Path

absolutePath...FIXME


absolutePath is used when:

assembleDeletionVectorPath

assembleDeletionVectorPath(
  targetParentPath: Path,
  id: UUID,
  prefix: String = ""): Path

assembleDeletionVectorPath creates a new Path (Apache Hadoop) for the given targetParentPath and fileName (and the optional prefix).


assembleDeletionVectorPath is used when:

isOnDisk

isOnDisk: Boolean

isOnDisk is the negation (opposite) of the isInline flag.


isOnDisk is used when:

isInline

isInline: Boolean

isInline holds true for the storageType being i.


isInline is used when: