Skip to content

JDBCRDD

JDBCRDD is a RDD of InternalRows that represents a structured query over a table in a database accessed via JDBC.

Note

JDBCRDD represents a SELECT requiredColumns FROM table query.

JDBCRDD is <> exclusively when JDBCRDD is requested to <> (when JDBCRelation is requested to build a scan).

[[internal-registries]] .JDBCRDD's Internal Properties (e.g. Registries, Counters and Flags) [cols="1,2",options="header",width="100%"] |=== | Name | Description

| columnList | [[columnList]] Column names

Used when...FIXME

| filterWhereClause | [[filterWhereClause]] <> as a SQL WHERE clause

Used when...FIXME |===

=== [[compute]] Computing Partition (in TaskContext) -- compute Method

[source, scala]

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]

NOTE: compute is part of Spark Core's RDD Contract to compute a partition (in a TaskContext).

compute...FIXME

=== [[resolveTable]] resolveTable Method

[source, scala]

resolveTable(options: JDBCOptions): StructType

resolveTable...FIXME

NOTE: resolveTable is used exclusively when JDBCRelation is requested for the <>.

=== [[scanTable]] Creating RDD for Distributed Data Scan -- scanTable Object Method

[source, scala]

scanTable( sc: SparkContext, schema: StructType, requiredColumns: Array[String], filters: Array[Filter], parts: Array[Partition], options: JDBCOptions): RDD[InternalRow]


scanTable takes the <> option.

scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.

scanTable uses the JdbcUtils object to createConnectionFactory and <> from the input schema to include the input requiredColumns only.

In the end, scanTable creates a new <>.

NOTE: scanTable is used exclusively when JDBCRelation is requested to <>.

Creating Instance

JDBCRDD takes the following to be created:

  • [[sc]] SparkContext
  • [[getConnection]] Function to create a Connection (() => Connection)
  • [[schema]] Schema
  • [[columns]] Array of column names
  • [[filters]] Array of Filter predicates
  • [[partitions]] Array of Spark Core's Partitions
  • [[url]] Connection URL
  • [[options]] JDBCOptions

=== [[getPartitions]] getPartitions Method

[source, scala]

getPartitions: Array[Partition]

NOTE: getPartitions is part of Spark Core's RDD Contract to...FIXME

getPartitions simply returns the <> (this JDBCRDD was created with).

=== [[pruneSchema]] pruneSchema Internal Method

[source, scala]

pruneSchema(schema: StructType, columns: Array[String]): StructType

pruneSchema...FIXME

NOTE: pruneSchema is used when...FIXME

=== [[compileFilter]] Converting Filter Predicate to SQL Expression -- compileFilter Object Method

[source, scala]

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]

compileFilter...FIXME

[NOTE]

compileFilter is used when:

  • JDBCRelation is requested to <>

* JDBCRDD is <>