JDBCRDD¶

JDBCRDD is a RDD of InternalRows that represents a structured query over a table in a database accessed via JDBC.

Note

JDBCRDD represents a SELECT requiredColumns FROM table query.

JDBCRDD is <> exclusively when JDBCRDD is requested to <> (when JDBCRelation is requested to build a scan).

[[internal-registries]] .JDBCRDD's Internal Properties (e.g. Registries, Counters and Flags) [cols="1,2",options="header",width="100%"] |=== | Name | Description

| columnList | [[columnList]] Column names

Used when...FIXME

| filterWhereClause | [[filterWhereClause]] <> as a SQL WHERE clause

Used when...FIXME |===

=== [[compute]] Computing Partition (in TaskContext) -- compute Method

[source, scala]¶

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]¶

NOTE: compute is part of Spark Core's RDD Contract to compute a partition (in a TaskContext).

compute...FIXME

=== [[resolveTable]] resolveTable Method

[source, scala]¶

resolveTable(options: JDBCOptions): StructType¶

resolveTable...FIXME

NOTE: resolveTable is used exclusively when JDBCRelation is requested for the <>.

=== [[scanTable]] Creating RDD for Distributed Data Scan -- scanTable Object Method

[source, scala]¶

scanTable( sc: SparkContext, schema: StructType, requiredColumns: Array[String], filters: Array[Filter], parts: Array[Partition], options: JDBCOptions): RDD[InternalRow]

scanTable takes the <> option.

scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.

scanTable uses the JdbcUtils object to createConnectionFactory and <> from the input schema to include the input requiredColumns only.

In the end, scanTable creates a new <>.

NOTE: scanTable is used exclusively when JDBCRelation is requested to <>.

Creating Instance¶

JDBCRDD takes the following to be created:

[[sc]] SparkContext
[[getConnection]] Function to create a Connection (() => Connection)
[[schema]] Schema
[[columns]] Array of column names
[[filters]] Array of Filter predicates
[[partitions]] Array of Spark Core's Partitions
[[url]] Connection URL
[[options]] JDBCOptions

=== [[getPartitions]] getPartitions Method

[source, scala]¶

getPartitions: Array[Partition]¶

NOTE: getPartitions is part of Spark Core's RDD Contract to...FIXME

getPartitions simply returns the <> (this JDBCRDD was created with).

=== [[pruneSchema]] pruneSchema Internal Method

[source, scala]¶

pruneSchema(schema: StructType, columns: Array[String]): StructType¶

pruneSchema...FIXME

NOTE: pruneSchema is used when...FIXME

=== [[compileFilter]] Converting Filter Predicate to SQL Expression -- compileFilter Object Method

[source, scala]¶

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]¶

compileFilter...FIXME

[NOTE]¶

compileFilter is used when:

JDBCRelation is requested to <>

JDBCRDD¶

[source, scala]¶

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]¶

[source, scala]¶

resolveTable(options: JDBCOptions): StructType¶

[source, scala]¶

Creating Instance¶

[source, scala]¶

getPartitions: Array[Partition]¶

[source, scala]¶

pruneSchema(schema: StructType, columns: Array[String]): StructType¶

[source, scala]¶

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]¶

[NOTE]¶

* JDBCRDD is <>¶

* `JDBCRDD` is <>¶