FileTable¶
FileTable
is an extension of the Table abstraction for file-based tables with support for read and write.
Contract¶
Fallback FileFormat¶
fallbackFileFormat: Class[_ <: FileFormat]
Fallback V1 FileFormat
Used when FallBackFileSourceV2
extended resolution rule is executed (to resolve an InsertIntoStatement
with a DataSourceV2Relation with a FileTable
)
Format Name¶
formatName: String
Name of the file table (format)
Schema Inference¶
inferSchema(
files: Seq[FileStatus]): Option[StructType]
Infers schema of the given files
(as Hadoop FileStatuses)
Used when FileTable
is requested for a data schema
supportsDataType¶
supportsDataType(
dataType: DataType): Boolean = true
supportsDataType
indicates whether a given DataType is supported in read/write path or not.
Default: All DataTypes are supported by default
FileTable
is requested for a schema- others (in FileTables)
Implementations¶
AvroTable
CSVTable
JsonTable
OrcTable
- ParquetTable
TextTable
Creating Instance¶
FileTable
takes the following to be created:
- SparkSession
- Options
- Paths
- Optional user-defined schema (
Option[StructType]
)
FileTable
is an abstract class and cannot be created directly. It is created indirectly for the concrete FileTables.
Table Capabilities¶
capabilities: java.util.Set[TableCapability]
capabilities
is part of the Table abstraction.
capabilities
are the following TableCapabilities:
Data Schema¶
dataSchema: StructType
dataSchema
is a schema of the data of the file-backed table
Lazy Value
dataSchema
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and cached afterwards.
dataSchema
is used when:
FileTable
is requested for a schema- others (in FileTables)
Partitioning¶
partitioning: Array[Transform]
partitioning
is part of the Table abstraction.
partitioning
...FIXME
Properties¶
properties: util.Map[String, String]
properties
is part of the Table abstraction.
properties
returns the options.
Table Schema¶
schema: StructType
schema
is part of the Table abstraction.
schema
...FIXME
PartitioningAwareFileIndex¶
fileIndex: PartitioningAwareFileIndex
Lazy Value
fileIndex
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
fileIndex
creates one of the following PartitioningAwareFileIndexs:
MetadataLogFileIndex
when reading from the results of a streaming query (and loading files from the metadata log instead of listing them using HDFS APIs)- InMemoryFileIndex
fileIndex
is used when:
- FileTables are requested for FileScanBuilders
Dataset
is requested for the inputFilesCacheManager
is requested to lookupAndRefreshFallBackFileSourceV2
is createdFileTable
is requested to dataSchema, schema, partitioning