Skip to content


AvroFileFormat is a FileFormat for Apache Avro, i.e. a data source format that can read and write Avro-encoded data in files.

[[shortName]] AvroFileFormat is a DataSourceRegister and registers itself as avro data source.

// ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0

// Writing data to Avro file(s)
  .format("avro") // <-- Triggers AvroFileFormat

// Reading Avro data from file(s)
val q = spark
  .format("avro") // <-- Triggers AvroFileFormat
| id|
|  0|

[[isSplitable]] AvroFileFormat is splitable, i.e. FIXME

=== [[buildReader]] Building Partitioned Data Reader -- buildReader Method

[source, scala]

buildReader( spark: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]


buildReader is part of the FileFormat abstraction.

=== [[inferSchema]] Inferring Schema -- inferSchema Method

[source, scala]

inferSchema( spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType]


inferSchema is part of the FileFormat abstraction.

=== [[prepareWrite]] Preparing Write Job -- prepareWrite Method

[source, scala]

prepareWrite( spark: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory


prepareWrite is part of the FileFormat abstraction.