rdd.py¶
rdd module (in pyspark package) defines RDD.
from pyspark.rdd import *
__all__¶
import *
The import statement uses the following convention: if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.
Learn more in 6.4.1. Importing * From a Package.
_prepare_for_python_RDD¶
_prepare_for_python_RDD(
sc: "SparkContext",
command: Any) -> Tuple[bytes, Any, Any, Any]
_prepare_for_python_RDD creates a CloudPickleSerializer to dumps the given command pair (that creates a pickled_command).
If the size of the pickled_command is above the broadcast threshold, _prepare_for_python_RDD creates a broadcast variable for pickled_command that is in turn dumps using the CloudPickleSerializer (that overrides the pickled_command).
In the end, _prepare_for_python_RDD returns the following:
pickled_commandbroadcast_vars- environment
- _python_includes
_prepare_for_python_RDD is used when:
pyspark.rddis requested to _wrap_functionpyspark.sql.udfis requested to _wrap_function