FileScan¶
FileScan
is an extension of the Scan abstraction for scans in Batch
queries.
FileScan
is with SupportsReportStatistics
.
Contract¶
dataFilters¶
dataFilters: Seq[Expression]
Used when...FIXME
fileIndex¶
fileIndex: PartitioningAwareFileIndex
Used when...FIXME
getFileUnSplittableReason¶
getFileUnSplittableReason(
path: Path): String
Used when...FIXME
partitionFilters¶
partitionFilters: Seq[Expression]
Used when...FIXME
readDataSchema¶
readDataSchema: StructType
Used when...FIXME
readPartitionSchema¶
readPartitionSchema: StructType
Used when...FIXME
seqToString¶
seqToString(
seq: Seq[Any]): String
Used when...FIXME
sparkSession¶
sparkSession: SparkSession
Used when...FIXME
withFilters¶
withFilters(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression]): FileScan
Used when...FIXME
Implementations¶
AvroScan
OrcScan
- ParquetScan
TextBasedFileScan
description¶
description(): String
description
...FIXME
description
is part of the Scan abstraction.
planInputPartitions¶
planInputPartitions(): Array[InputPartition]
planInputPartitions
is partitions.
planInputPartitions
is part of the Batch abstraction.
FilePartitions¶
partitions: Seq[FilePartition]
partitions
requests the PartitioningAwareFileIndex for the partition directories (selectedPartitions).
For every selected partition directory, partitions
requests the Hadoop FileStatuses that are split (if isSplitable) to maxSplitBytes and sorted by size (in reversed order).
In the end, partitions
returns the FilePartitions.
estimateStatistics¶
estimateStatistics(): Statistics
estimateStatistics
...FIXME
estimateStatistics
is part of the SupportsReportStatistics abstraction.
toBatch¶
toBatch: Batch
toBatch
is enabled (true
) by default.
toBatch
is part of the Scan abstraction.
readSchema¶
readSchema(): StructType
readSchema
...FIXME
readSchema
is part of the Scan abstraction.
isSplitable¶
isSplitable(
path: Path): Boolean
isSplitable
is false
.
Used when:
FileScan
is requested to getFileUnSplittableReason and partitions