Skip to content

ParquetTable

ConvertTargetTable

ParquetTable is a ConvertTargetTable.

Creating Instance

ParquetTable takes the following to be created:

  • SparkSession
  • Base Path
  • Partition Schema (Option[StructType])

ParquetTable is created when:

numFiles

numFiles: Long

numFiles inferSchema when _numFiles registry is uninitialized.

In the end, numFiles returns the value of _numFiles registry.

numFiles is part of the ConvertTargetTable abstraction.

_numFiles

_numFiles: Option[Long]

ParquetTable defines _numFiles internal registry.

_numFiles is None (uninitialized) when ParquetTable is created.

_numFiles is initialized once when ParquetTable is requested for the numFiles (and inferSchema).

_numFiles is used for the numFiles.

inferSchema

inferSchema(): Unit

inferSchema...FIXME

inferSchema is used when:

getSchemaForBatch

getSchemaForBatch(
  spark: SparkSession,
  batch: Seq[SerializableFileStatus],
  serializedConf: SerializableConfiguration): StructType

getSchemaForBatch...FIXME

mergeSchemasInParallel

mergeSchemasInParallel(
  sparkSession: SparkSession,
  filesToTouch: Seq[FileStatus],
  serializedConf: SerializableConfiguration): Option[StructType]

mergeSchemasInParallel...FIXME