RowEncoder¶

RowEncoder is part of the Encoder framework and is used as the encoder of DataFrames (Datasets of Rows).

Note

DataFrame type is a type alias for Dataset[Row] that expects an Encoder[Row] available in scope which is RowEncoder itself.

Creating ExpressionEncoder of Rows¶

apply(
  schema: StructType): ExpressionEncoder[Row]

apply creates a BoundReference to a nullable field of ObjectType (with Row).

apply creates a serializer for input objects as the BoundReference with the given schema.

apply creates a deserializer for a GetColumnByOrdinal expression with the given schema.

In the end, apply creates an ExpressionEncoder for Rows with the serializer and deserializer.

serializerFor¶

serializerFor(
  inputObject: Expression,
  inputType: DataType): Expression

serializerFor is a recursive method that decomposes non-primitive (complex) DataTypes (e.g. ArrayType, MapType and StructType) to primitives.

For a StructType, creates a CreateNamedStruct with serializer expressions for the (inner) fields.

For native types, serializerFor returns the given inputObject expression (that ends recursion).

serializerFor handles the following DataTypes in a custom way (and the given order):

PythonUserDefinedType
UserDefinedType
TimestampType
DateType
DayTimeIntervalType
YearMonthIntervalType
DecimalType
StringType
ArrayType
MapType

Demo¶

import org.apache.spark.sql.types._
val schema = StructType(
  StructField("id", LongType, nullable = false) ::
  StructField("name", StringType, nullable = false) :: Nil)

import org.apache.spark.sql.catalyst.encoders.RowEncoder
val encoder = RowEncoder(schema)

assert(encoder.flat == false, "RowEncoder is never flat")