DeltaTable¶

DeltaTable is the management interface of delta tables.

io.delta.tables Package¶

DeltaTable belongs to io.delta.tables package.

import io.delta.tables.DeltaTable

Creating Instance¶

DeltaTable takes the following to be created:

Table Data (Dataset[Row])
DeltaTableV2

DeltaTable is created using DeltaTable.forPath and DeltaTable.forName utilities (and indirectly using create, createIfNotExists, createOrReplace and replace).

DeltaLog¶

deltaLog: DeltaLog

DeltaLog of the DeltaTableV2.

Utilities (Static Methods)¶

columnBuilder¶

columnBuilder(
  colName: String): DeltaColumnBuilder
columnBuilder(
  spark: SparkSession,
  colName: String): DeltaColumnBuilder

Creates a DeltaColumnBuilder

convertToDelta¶

convertToDelta(
  spark: SparkSession,
  identifier: String): DeltaTable
convertToDelta(
  spark: SparkSession,
  identifier: String,
  partitionSchema: String): DeltaTable
convertToDelta(
  spark: SparkSession,
  identifier: String,
  partitionSchema: StructType): DeltaTable

convertToDelta converts the parquet table to delta format

Note

Refer to Demo: Converting Parquet Dataset Into Delta Format for a demo of DeltaTable.convertToDelta.

create¶

create(): DeltaTableBuilder
create(
  spark: SparkSession): DeltaTableBuilder

Creates a DeltaTableBuilder

createIfNotExists¶

createIfNotExists(): DeltaTableBuilder
createIfNotExists(
  spark: SparkSession): DeltaTableBuilder

Creates a DeltaTableBuilder (with CreateTableOptions and ifNotExists flag enabled)

createOrReplace¶

createOrReplace(): DeltaTableBuilder
createOrReplace(
  spark: SparkSession): DeltaTableBuilder

Creates a DeltaTableBuilder (with ReplaceTableOptions and orCreate flag enabled)

forName¶

forName(
  sparkSession: SparkSession,
  tableName: String): DeltaTable
forName(
  tableOrViewName: String): DeltaTable

forName uses ParserInterface (of the given SparkSession) to parse the given table name.

forName checks whether the given table name is of a Delta table and, if so, creates a DeltaTable with the following:

Dataset that represents loading data from the specified table name (using SparkSession.table operator)
DeltaTableV2

forName throws an AnalysisException when the given table name is for non-Delta table:

[deltaTableIdentifier] is not a Delta table.

forPath¶

forPath(
  sparkSession: SparkSession,
  path: String): DeltaTable
forPath(
  path: String): DeltaTable

forPath checks whether the given table name is of a Delta table and, if so, creates a DeltaTable with the following:

Dataset that represents loading data from the specified path using delta data source
DeltaTableV2

forPath throws an AnalysisException when the given path does not belong to a delta table:

[deltaTableIdentifier] is not a Delta table.

isDeltaTable¶

isDeltaTable(
  sparkSession: SparkSession,
  identifier: String): Boolean
isDeltaTable(
  identifier: String): Boolean

isDeltaTable...FIXME

replace¶

replace(): DeltaTableBuilder
replace(
  spark: SparkSession): DeltaTableBuilder

Creates a DeltaTableBuilder (with ReplaceTableOptions and orCreate flag disabled)

Operators¶

alias¶

alias(
  alias: String): DeltaTable

Applies an alias to the DeltaTable (equivalent to as)

as¶

as(
  alias: String): DeltaTable

Applies an alias to the DeltaTable

delete¶

delete(): Unit
delete(
  condition: Column): Unit
delete(
  condition: String): Unit

Executes DeleteFromTable command

generate¶

generate(
  mode: String): Unit

Executes the DeltaGenerateCommand

history¶

history(): DataFrame
history(
  limit: Int): DataFrame

Requests the DeltaHistoryManager for history

merge¶

merge(
  source: DataFrame,
  condition: Column): DeltaMergeBuilder
merge(
  source: DataFrame,
  condition: String): DeltaMergeBuilder

Creates a DeltaMergeBuilder

optimize¶

optimize(): DeltaOptimizeBuilder

Creates a DeltaOptimizeBuilder

restoreToTimestamp¶

restoreToTimestamp(
  timestamp: String): DataFrame

Executes Restore

restoreToVersion¶

restoreToVersion(
  version: Long): DataFrame

Executes Restore

toDF¶

toDF: Dataset[Row]

Returns the DataFrame representation of the DeltaTable

update¶

update(
  condition: Column,
  set: Map[String, Column]): Unit
update(
  set: Map[String, Column]): Unit

Executes UpdateTable command

updateExpr¶

updateExpr(
  set: Map[String, String]): Unit
updateExpr(
  condition: String,
  set: Map[String, String]): Unit

Executes UpdateTable command

upgradeTableProtocol¶

upgradeTableProtocol(
  readerVersion: Int,
  writerVersion: Int): Unit

upgradeTableProtocol creates a new Protocol (for the given reader and writer versions) and requests the DeltaLog to upgrade the protocol.

Transactional Operation

upgradeTableProtocol is a transactional operation.

Updates the protocol version of the table to leverage new features.

Upgrading the reader version will prevent all clients that have an older version of Delta Lake from accessing this table.

Upgrading the writer version will prevent older versions of Delta Lake to write to this table.

The reader or writer version cannot be downgraded.

[SC-44271][DELTA] Introduce default protocol version for Delta tables

upgradeTableProtocol was introduced in [SC-44271][DELTA] Introduce default protocol version for Delta tables commit.

vacuum¶

vacuum(): DataFrame
vacuum(
  retentionHours: Double): DataFrame

Deletes files and directories (recursively) in the DeltaTable that are not needed by the table (and maintains older versions up to the given retention threshold).