Parquet Data Source¶
Apache Parquet is a columnar storage format for the Apache Hadoop ecosystem with support for efficient storage and encoding of data.
Spark SQL supports parquet
-encoded data using ParquetDataSourceV2. There is also an older ParquetFileFormat that is used as a fallbackFileFormat, for backward-compatibility and Hive (to name a few use cases).
Parquet is the default data source format based on the spark.sql.sources.default configuration property.
Parquet data source uses spark.sql.parquet
prefix for parquet-specific configuration properties.