ParquetTable¶
ConvertTargetTable¶
ParquetTable is a ConvertTargetTable.
Creating Instance¶
ParquetTable takes the following to be created:
-
SparkSession - Base Path
- Partition Schema (
Option[StructType])
ParquetTable is created when:
ConvertToDeltaCommandis requested to getTargetTable
numFiles¶
numFiles: Long
numFiles inferSchema when _numFiles registry is uninitialized.
In the end, numFiles returns the value of _numFiles registry.
numFiles is part of the ConvertTargetTable abstraction.
_numFiles¶
_numFiles: Option[Long]
ParquetTable defines _numFiles internal registry.
_numFiles is None (uninitialized) when ParquetTable is created.
_numFiles is initialized once when ParquetTable is requested for the numFiles (and inferSchema).
_numFiles is used for the numFiles.
inferSchema¶
inferSchema(): Unit
inferSchema...FIXME
inferSchema is used when:
ParquetTableis requested for the numFiles and tableSchema
getSchemaForBatch¶
getSchemaForBatch(
spark: SparkSession,
batch: Seq[SerializableFileStatus],
serializedConf: SerializableConfiguration): StructType
getSchemaForBatch...FIXME
mergeSchemasInParallel¶
mergeSchemasInParallel(
sparkSession: SparkSession,
filesToTouch: Seq[FileStatus],
serializedConf: SerializableConfiguration): Option[StructType]
mergeSchemasInParallel...FIXME