Skip to content

SparkContext

SparkContext Initialization

Creating Instance

SparkContext takes the following to be created:

  • Master URL (default: None)
  • Application Name (default: None)
  • Spark Home (default: None)
  • Py Files (default: None)
  • Environment (default: None)
  • Batch Size (default: 0)
  • PickleSerializer
  • SparkConf (default: None)
  • Gateway (default: None)
  • Corresponding SparkContext on JVM (default: None)
  • BasicProfiler

While being created, SparkContext _ensure_initialized (with the gateway and the conf) followed by _do_init.

Demo

from pyspark import SparkContext

JavaGateway

SparkContext defines _gateway property for a JavaGateway that is given or launched when _ensure_initialized.

JVMView

SparkContext defines _jvm property for a JVMView (py4j) to access to the Java Virtual Machine of the JavaGateway.

_ensure_initialized

_ensure_initialized(
  cls, instance=None, gateway=None, conf=None)

_ensure_initialized is a @classmethod.

_ensure_initialized takes the given gateway or launch_gateway.

_ensure_initialized...FIXME

_ensure_initialized is used when:

_do_init

_do_init(
  self, master, appName, sparkHome,
  pyFiles, environment, batchSize, serializer,
  conf, jsc, profiler_cls)

_do_init...FIXME