rdd.py¶
rdd
module (in pyspark
package) defines RDD.
from pyspark.rdd import *
__all__¶
import *
The import
statement uses the following convention: if a package’s __init__.py
code defines a list named __all__
, it is taken to be the list of module names that should be imported when from package import *
is encountered.
Learn more in 6.4.1. Importing * From a Package.
_prepare_for_python_RDD¶
_prepare_for_python_RDD(
sc: "SparkContext",
command: Any) -> Tuple[bytes, Any, Any, Any]
_prepare_for_python_RDD
creates a CloudPickleSerializer
to dumps
the given command
pair (that creates a pickled_command
).
If the size of the pickled_command
is above the broadcast threshold, _prepare_for_python_RDD
creates a broadcast variable for pickled_command
that is in turn dumps
using the CloudPickleSerializer
(that overrides the pickled_command
).
In the end, _prepare_for_python_RDD
returns the following:
pickled_command
broadcast_vars
- environment
- _python_includes
_prepare_for_python_RDD
is used when:
pyspark.rdd
is requested to _wrap_functionpyspark.sql.udf
is requested to _wrap_function