Skip to content

ExpressionEncoder

ExpressionEncoder[T] is the only built-in Encoder.

Important

ExpressionEncoder is the only supported Encoder which is enforced when Dataset is created (even though Dataset data structure accepts a bare Encoder[T]).

Creating Instance

ExpressionEncoder takes the following to be created:

ExpressionEncoder is created when:

Serializer

serializer: Seq[NamedExpression]

ExpressionEncoder creates the serializer (to be NamedExpressions) when created.

Encoders Utility

Encoders utility contains the ExpressionEncoder for Scala and Java primitive types, e.g. boolean, long, String, java.sql.Date, java.sql.Timestamp, Array[Byte].

Demo

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
val stringEncoder = ExpressionEncoder[String]
scala> val row = stringEncoder.toRow("hello world")
row: org.apache.spark.sql.catalyst.InternalRow = [0,100000000b,6f77206f6c6c6568,646c72]
import org.apache.spark.sql.catalyst.expressions.UnsafeRow
scala> val unsafeRow = row match { case ur: UnsafeRow => ur }
unsafeRow: org.apache.spark.sql.catalyst.expressions.UnsafeRow = [0,100000000b,6f77206f6c6c6568,646c72]

Creating ExpressionEncoder

apply[T : TypeTag](): ExpressionEncoder[T]

apply creates an ExpressionEncoder with the following:

Creating ExpressionEncoder for Scala Tuples

tuple[T](
  e: ExpressionEncoder[T]): ExpressionEncoder[Tuple1[T]]
tuple[T1, T2](
  e1: ExpressionEncoder[T1],
  e2: ExpressionEncoder[T2]): ExpressionEncoder[(T1, T2)]
tuple[T1, T2, T3](
  e1: ExpressionEncoder[T1],
  e2: ExpressionEncoder[T2],
  e3: ExpressionEncoder[T3]): ExpressionEncoder[(T1, T2, T3)]
tuple[T1, T2, T3, T4](
  e1: ExpressionEncoder[T1],
  e2: ExpressionEncoder[T2],
  e3: ExpressionEncoder[T3],
  e4: ExpressionEncoder[T4]): ExpressionEncoder[(T1, T2, T3, T4)]
tuple[T1, T2, T3, T4, T5](
  e1: ExpressionEncoder[T1],
  e2: ExpressionEncoder[T2],
  e3: ExpressionEncoder[T3],
  e4: ExpressionEncoder[T4],
  e5: ExpressionEncoder[T5]): ExpressionEncoder[(T1, T2, T3, T4, T5)]
tuple(
  encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_]

tuple...FIXME

tuple is used when:

Creating ExpressionEncoder for Java Bean

javaBean[T](
  beanClass: Class[T]): ExpressionEncoder[T]

javaBean...FIXME

javaBean is used when:

  • Encoders utility is used to bean

resolveAndBind

resolveAndBind(
  attrs: Seq[Attribute] = schema.toAttributes,
  analyzer: Analyzer = SimpleAnalyzer): ExpressionEncoder[T]

resolveAndBind creates a deserializer for a LocalRelation with the given Attributes (to create a dummy query plan).

resolveAndBind...FIXME


resolveAndBind is used when:

Demo

case class Person(id: Long, name: String)
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Person].schema
import org.apache.spark.sql.catalyst.encoders.{RowEncoder, ExpressionEncoder}
import org.apache.spark.sql.Row
val encoder: ExpressionEncoder[Row] = RowEncoder.apply(schema).resolveAndBind()
val deserializer = encoder.deserializer
import org.apache.spark.sql.catalyst.InternalRow
val input = InternalRow(1, "Jacek")
scala> deserializer.eval(input)
java.lang.UnsupportedOperationException: Only code-generated evaluation is supported
  at org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow.eval(objects.scala:1105)
  ... 54 elided
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext
val code = deserializer.genCode(ctx).code