AvroIO

AvroIO is a utility to create PTransforms for reading and writing Avro files.

import org.apache.beam.sdk.io.AvroIO
val records = AvroIO.read(classOf[String]).from("*.avro")

scala> :type records
org.apache.beam.sdk.io.AvroIO.Read[String]

Reading In

<T> Read<T> read(
  Class<T> recordClass)
<T> ReadFiles<T> readFiles(
  Class<T> recordClass)
<T> ReadAll<T> readAll(
  Class<T> recordClass)
Read<GenericRecord> readGenericRecords(
  Schema schema)
ReadFiles<GenericRecord> readFilesGenericRecords(
  Schema schema)
ReadAll<GenericRecord> readAllGenericRecords(
  Schema schema)
Read<GenericRecord> readGenericRecords(
  String schema)
ReadFiles<GenericRecord> readFilesGenericRecords(
  String schema)
<T> Parse<T> parseGenericRecords(
  SerializableFunction<GenericRecord, T> parseFn)
<T> ParseFiles<T> parseFilesGenericRecords(
  SerializableFunction<GenericRecord, T> parseFn)

read creates a source PTransform that produces a PCollection of records (of type T or GenericRecord, i.e. PTransform<PBegin, PCollection<T>>).

Writing Out

<T> Write<T> write(
  Class<T> recordClass)
Write<GenericRecord> writeGenericRecords(
  Schema schema)
// other writes

write creates a sink PTransform (PTransform<PCollection<T>, PDone>).