Table¶

Table is a TableInput and a Dataset output.

Creating Instance¶

Table takes the following to be created:

TableIdentifier
specifiedSchema (optional)
partitionCols (optional)
clusterCols (optional)
normalizedPath (optional)
Properties
Comment (optional)
QueryOrigin
isStreamingTable flag
Format (optional)

Table is created when:

PipelinesHandler is requested to define an Output
SqlGraphRegistrationContext is requested to handle CreateMaterializedViewAsSelect, CreateStreamingTableAsSelect, CreateStreamingTable logical commands

Load Data¶

Input

load(
  readOptions: InputReadOptions): DataFrame

load is part of the Input abstraction.

load is a "shortcut" to create a batch or a streaming DataFrame (based on the type of the given InputReadOptions).

For StreamingReadOptions, load creates a DataStreamReader (Spark Structured Streaming) to load a table (using DataStreamReader.table operator) with the given StreamingReadOptions.

For BatchReadOptions, load creates a DataFrameReader to load a table (using DataFrameReader.table operator).