AvroFileFormat¶
AvroFileFormat
is a FileFormat for Apache Avro, i.e. a data source format that can read and write Avro-encoded data in files.
[[shortName]] AvroFileFormat
is a DataSourceRegister and registers itself as avro data source.
// ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0
// Writing data to Avro file(s)
spark
.range(1)
.write
.format("avro") // <-- Triggers AvroFileFormat
.save("data.avro")
// Reading Avro data from file(s)
val q = spark
.read
.format("avro") // <-- Triggers AvroFileFormat
.load("data.avro")
scala> q.show
+---+
| id|
+---+
| 0|
+---+
[[isSplitable]] AvroFileFormat
is splitable, i.e. FIXME
=== [[buildReader]] Building Partitioned Data Reader -- buildReader
Method
[source, scala]¶
buildReader( spark: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
buildReader
...FIXME
buildReader
is part of the FileFormat abstraction.
=== [[inferSchema]] Inferring Schema -- inferSchema
Method
[source, scala]¶
inferSchema( spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType]
inferSchema
...FIXME
inferSchema
is part of the FileFormat abstraction.
=== [[prepareWrite]] Preparing Write Job -- prepareWrite
Method
[source, scala]¶
prepareWrite( spark: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory
prepareWrite
...FIXME
prepareWrite
is part of the FileFormat abstraction.