Skip to content

FileTable

FileTable is an extension of the Table abstraction for file-based tables with support for read and write.

Contract

Fallback FileFormat

fallbackFileFormat: Class[_ <: FileFormat]

Fallback V1 FileFormat

Used when FallBackFileSourceV2 extended resolution rule is executed (to resolve an InsertIntoStatement with a DataSourceV2Relation with a FileTable)

Format Name

formatName: String

Name of the file table (format)

Schema Inference

inferSchema(
    files: Seq[FileStatus]): Option[StructType]

Infers schema of the given files (as Hadoop FileStatuses)

Used when FileTable is requested for a data schema

supportsDataType

supportsDataType(
    dataType: DataType): Boolean = true

supportsDataType indicates whether a given DataType is supported in read/write path or not.

Default: All DataTypes are supported by default

Implementations

  • AvroTable
  • CSVTable
  • JsonTable
  • OrcTable
  • ParquetTable
  • TextTable

Creating Instance

FileTable takes the following to be created:

FileTable is an abstract class and cannot be created directly. It is created indirectly for the concrete FileTables.

Table Capabilities

capabilities: java.util.Set[TableCapability]

capabilities is part of the Table abstraction.


capabilities are the following TableCapabilities:

Data Schema

dataSchema: StructType

dataSchema is a schema of the data of the file-backed table

Lazy Value

dataSchema is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and cached afterwards.


dataSchema is used when:

Partitioning

partitioning: Array[Transform]

partitioning is part of the Table abstraction.


partitioning...FIXME

Properties

properties: util.Map[String, String]

properties is part of the Table abstraction.


properties returns the options.

Table Schema

schema: StructType

schema is part of the Table abstraction.


schema...FIXME

PartitioningAwareFileIndex

fileIndex: PartitioningAwareFileIndex
Lazy Value

fileIndex is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

fileIndex creates one of the following PartitioningAwareFileIndexs:

  • MetadataLogFileIndex when reading from the results of a streaming query (and loading files from the metadata log instead of listing them using HDFS APIs)
  • InMemoryFileIndex

fileIndex is used when: