RDDConversions Helper Object¶
RDDConversions
is a Scala object that is used to <
=== [[productToRowRdd]] productToRowRdd
Method
[source, scala]¶
productToRowRddA <: Product: RDD[InternalRow]¶
productToRowRdd
...FIXME
NOTE: productToRowRdd
is used when...FIXME
=== [[rowToRowRdd]] Converting Scala Objects In Rows to Values Of Catalyst Types -- rowToRowRdd
Method
[source, scala]¶
rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]¶
rowToRowRdd
maps over partitions of the input RDD[Row]
(using RDD.mapPartitions
operator) that creates a MapPartitionsRDD
with a "map" function.
TIP: Use RDD.toDebugString
to see the additional MapPartitionsRDD
in an RDD lineage.
The "map" function takes a Scala Iterator
of Row objects and does the following:
-
Creates a
GenericInternalRow
(of the size that is the number of columns per the inputSeq[DataType]
) -
Creates a converter function for every
DataType
inSeq[DataType]
-
For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the
GenericInternalRow
-
In the end, returns a
GenericInternalRow
for every row
rowToRowRdd
is used when DataSourceStrategy execution planning strategy is executed (and requested to toCatalystRDD).