Skip to content

PandasConversionMixin

PandasConversionMixin is a Python mixin of DataFrame to convert to Pandas (pandas.DataFrame).

toPandas

toPandas(self)

toPandas can only be used with DataFrame.

With Arrow optimization enabled, toPandas to_arrow_schema.

pyarrow

Arrow Optimization uses pyarrow module.

toPandas renames the columns to be of col_[index] format and _collect_as_arrow (with split_batches based on arrowPySparkSelfDestructEnabled configuration property).

toPandas creates a pyarrow.Table (from the RecordBatches) and converts the table to a pandas-compatible NumPy array or DataFrame. toPandas renames the columns back to the initial column names.

Note

Column order is assumed.

With Arrow optimization disabled, toPandas collects the records (DataFrame.collect) and creates a pandas.DataFrame (with some type munging).