DeltaTable¶
DeltaTable
is the management interface of delta tables.
io.delta.tables Package¶
DeltaTable
belongs to io.delta.tables
package.
import io.delta.tables.DeltaTable
Creating Instance¶
DeltaTable
takes the following to be created:
- Table Data (
Dataset[Row]
) - DeltaTableV2
DeltaTable
is created using DeltaTable.forPath and DeltaTable.forName utilities (and indirectly using create, createIfNotExists, createOrReplace and replace).
DeltaLog¶
deltaLog: DeltaLog
DeltaLog of the DeltaTableV2.
Utilities (Static Methods)¶
columnBuilder¶
columnBuilder(
colName: String): DeltaColumnBuilder
columnBuilder(
spark: SparkSession,
colName: String): DeltaColumnBuilder
Creates a DeltaColumnBuilder
convertToDelta¶
convertToDelta(
spark: SparkSession,
identifier: String): DeltaTable
convertToDelta(
spark: SparkSession,
identifier: String,
partitionSchema: String): DeltaTable
convertToDelta(
spark: SparkSession,
identifier: String,
partitionSchema: StructType): DeltaTable
convertToDelta
converts the parquet table to delta format
Note
Refer to Demo: Converting Parquet Dataset Into Delta Format for a demo of DeltaTable.convertToDelta
.
create¶
create(): DeltaTableBuilder
create(
spark: SparkSession): DeltaTableBuilder
Creates a DeltaTableBuilder
createIfNotExists¶
createIfNotExists(): DeltaTableBuilder
createIfNotExists(
spark: SparkSession): DeltaTableBuilder
Creates a DeltaTableBuilder (with CreateTableOptions
and ifNotExists
flag enabled)
createOrReplace¶
createOrReplace(): DeltaTableBuilder
createOrReplace(
spark: SparkSession): DeltaTableBuilder
Creates a DeltaTableBuilder (with ReplaceTableOptions
and orCreate
flag enabled)
forName¶
forName(
sparkSession: SparkSession,
tableName: String): DeltaTable
forName(
tableOrViewName: String): DeltaTable
forName
uses ParserInterface
(of the given SparkSession
) to parse the given table name.
forName
checks whether the given table name is of a Delta table and, if so, creates a DeltaTable with the following:
Dataset
that represents loading data from the specified table name (usingSparkSession.table
operator)- DeltaTableV2
forName
throws an AnalysisException
when the given table name is for non-Delta table:
[deltaTableIdentifier] is not a Delta table.
forPath¶
forPath(
sparkSession: SparkSession,
path: String): DeltaTable
forPath(
path: String): DeltaTable
forPath
checks whether the given table name is of a Delta table and, if so, creates a DeltaTable with the following:
Dataset
that represents loading data from the specifiedpath
using delta data source- DeltaTableV2
forPath
throws an AnalysisException
when the given path
does not belong to a delta table:
[deltaTableIdentifier] is not a Delta table.
isDeltaTable¶
isDeltaTable(
sparkSession: SparkSession,
identifier: String): Boolean
isDeltaTable(
identifier: String): Boolean
isDeltaTable
...FIXME
replace¶
replace(): DeltaTableBuilder
replace(
spark: SparkSession): DeltaTableBuilder
Creates a DeltaTableBuilder (with ReplaceTableOptions
and orCreate
flag disabled)
Operators¶
alias¶
alias(
alias: String): DeltaTable
Applies an alias to the DeltaTable
(equivalent to as)
as¶
as(
alias: String): DeltaTable
Applies an alias to the DeltaTable
delete¶
delete(): Unit
delete(
condition: Column): Unit
delete(
condition: String): Unit
Executes DeleteFromTable command
generate¶
generate(
mode: String): Unit
Executes the DeltaGenerateCommand
history¶
history(): DataFrame
history(
limit: Int): DataFrame
Requests the DeltaHistoryManager for history
merge¶
merge(
source: DataFrame,
condition: Column): DeltaMergeBuilder
merge(
source: DataFrame,
condition: String): DeltaMergeBuilder
Creates a DeltaMergeBuilder
optimize¶
optimize(): DeltaOptimizeBuilder
Creates a DeltaOptimizeBuilder
restoreToTimestamp¶
restoreToTimestamp(
timestamp: String): DataFrame
restoreToVersion¶
restoreToVersion(
version: Long): DataFrame
toDF¶
toDF: Dataset[Row]
Returns the DataFrame representation of the DeltaTable
update¶
update(
condition: Column,
set: Map[String, Column]): Unit
update(
set: Map[String, Column]): Unit
updateExpr¶
updateExpr(
set: Map[String, String]): Unit
updateExpr(
condition: String,
set: Map[String, String]): Unit
upgradeTableProtocol¶
upgradeTableProtocol(
readerVersion: Int,
writerVersion: Int): Unit
upgradeTableProtocol
creates a new Protocol (for the given reader and writer versions) and requests the DeltaLog to upgrade the protocol.
Transactional Operation
upgradeTableProtocol
is a transactional operation.
Updates the protocol version of the table to leverage new features.
Upgrading the reader version will prevent all clients that have an older version of Delta Lake from accessing this table.
Upgrading the writer version will prevent older versions of Delta Lake to write to this table.
The reader or writer version cannot be downgraded.
[SC-44271][DELTA] Introduce default protocol version for Delta tables
upgradeTableProtocol
was introduced in [SC-44271][DELTA] Introduce default protocol version for Delta tables commit.
vacuum¶
vacuum(): DataFrame
vacuum(
retentionHours: Double): DataFrame
Deletes files and directories (recursively) in the DeltaTable that are not needed by the table (and maintains older versions up to the given retention threshold).