InternalFrame¶
InternalFrame
is the underlying managed Spark DataFrame of pyspark.pandas.DataFrame.
Creating Instance¶
InternalFrame
takes the following to be created:
- Spark DataFrame
-
index_spark_columns
(optional) -
index_names
(optional) -
index_fields
(optional) -
column_labels
(optional) -
data_spark_columns
(optional) -
data_fields
(optional) -
column_label_names
(optional)
Spark DataFrame¶
InternalFrame
is given a Spark DataFrame when created.
Managed Spark DataFrame¶
_sdf
is the underlying managed Spark DataFrame.
_sdf
is the Spark DataFrame with attach_default_index and __natural_order__ columns selected.
Default Index Column Name¶
InternalFrame
uses the following as the name of the default index column:
__index_level_0__
Index Column Pattern¶
InternalFrame
defines a regular pattern to match the index columns.
__index_level_[0-9]+__
It is invalid to name columns in the Spark DataFrame to match the index column pattern. Index columns must not be in the columns of the Spark DataFrame.
to_internal_spark_frame¶
@lazy_property
def to_internal_spark_frame(
self) -> SparkDataFrame
to_internal_spark_frame
returns the spark_frame with the index_spark_columns followed by the data_spark_columns.
spark_frame¶
from pyspark.sql import DataFrame as SparkDataFrame
@property
def spark_frame(
self) -> SparkDataFrame
spark_frame
returns the underlying managed Spark DataFrame.
Demo¶
from pyspark import pandas as ps
psdf = ps.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, 16],
'E': [17, 18, 19, 20]}, columns = ['A', 'B', 'C', 'D', 'E'])
psdf._internal
# <pyspark.pandas.internal.InternalFrame object at 0x7f7ff024f820>
psdf._internal.spark_frame
# DataFrame[__index_level_0__: bigint, A: bigint, B: bigint, C: bigint, D: bigint, E: bigint, __natural_order__: bigint]
psdf._internal.spark_frame.show()
# +-----------------+---+---+---+---+---+-----------------+
# |__index_level_0__| A| B| C| D| E|__natural_order__|
# +-----------------+---+---+---+---+---+-----------------+
# | 0| 1| 5| 9| 13| 17| 17179869184|
# | 1| 2| 6| 10| 14| 18| 42949672960|
# | 2| 3| 7| 11| 15| 19| 68719476736|
# | 3| 4| 8| 12| 16| 20| 94489280512|
# +-----------------+---+---+---+---+---+-----------------+
psdf._internal.to_internal_spark_frame.show()
# +-----------------+---+---+---+---+---+
# |__index_level_0__| A| B| C| D| E|
# +-----------------+---+---+---+---+---+
# | 0| 1| 5| 9| 13| 17|
# | 1| 2| 6| 10| 14| 18|
# | 2| 3| 7| 11| 15| 19|
# | 3| 4| 8| 12| 16| 20|
# +-----------------+---+---+---+---+---+