Skip to content

rdd module (in pyspark package) defines RDD.

from pyspark.rdd import *


import *

The import statement uses the following convention: if a package’s code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.

Learn more in 6.4.1. Importing * From a Package.


  sc: "SparkContext",
  command: Any) -> Tuple[bytes, Any, Any, Any]

_prepare_for_python_RDD creates a CloudPickleSerializer to dumps the given command pair (that creates a pickled_command).

If the size of the pickled_command is above the broadcast threshold, _prepare_for_python_RDD creates a broadcast variable for pickled_command that is in turn dumps using the CloudPickleSerializer (that overrides the pickled_command).

In the end, _prepare_for_python_RDD returns the following:

_prepare_for_python_RDD is used when: