PandasGroupUtils¶
PandasGroupUtils
utility is used by the following physical operators when executed:
FlatMapCoGroupsInPandasExec
- FlatMapGroupsInPandasExec
executePython¶
executePython[T](
data: Iterator[T],
output: Seq[Attribute],
runner: BasePythonRunner[T, ColumnarBatch]): Iterator[InternalRow]
executePython
requests the given BasePythonRunner to compute the (partition) data
(with the current task's TaskContext
and the partition ID).
executePython
...FIXME
executePython
is used when:
FlatMapCoGroupsInPandasExec
and FlatMapGroupsInPandasExec physical operators are executed
groupAndProject¶
groupAndProject(
input: Iterator[InternalRow],
groupingAttributes: Seq[Attribute],
inputSchema: Seq[Attribute],
dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])]
groupAndProject
creates a GroupedIterator
for the input
iterator (of InternalRow
s), the groupingAttributes
and the inputSchema
.
groupAndProject
...FIXME
groupAndProject
is used when:
FlatMapCoGroupsInPandasExec
and FlatMapGroupsInPandasExec physical operators are executed