Skip to content

rdd.py

rdd module (in pyspark package) defines RDD.

from pyspark.rdd import *

__all__

import *

The import statement uses the following convention: if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.

Learn more in 6.4.1. Importing * From a Package.

_prepare_for_python_RDD

_prepare_for_python_RDD(
  sc: "SparkContext",
  command: Any) -> Tuple[bytes, Any, Any, Any]

_prepare_for_python_RDD creates a CloudPickleSerializer to dumps the given command pair (that creates a pickled_command).

If the size of the pickled_command is above the broadcast threshold, _prepare_for_python_RDD creates a broadcast variable for pickled_command that is in turn dumps using the CloudPickleSerializer (that overrides the pickled_command).

In the end, _prepare_for_python_RDD returns the following:


_prepare_for_python_RDD is used when: