JDBCRDD¶
JDBCRDD
is a RDD
of InternalRows that represents a structured query over a table in a database accessed via JDBC.
Note
JDBCRDD
represents a SELECT requiredColumns FROM table
query.
JDBCRDD
is <JDBCRDD
is requested to <JDBCRelation
is requested to build a scan).
[[internal-registries]] .JDBCRDD's Internal Properties (e.g. Registries, Counters and Flags) [cols="1,2",options="header",width="100%"] |=== | Name | Description
| columnList
| [[columnList]] Column names
Used when...FIXME
| filterWhereClause
| [[filterWhereClause]] <WHERE
clause
Used when...FIXME |===
=== [[compute]] Computing Partition (in TaskContext) -- compute
Method
[source, scala]¶
compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]¶
NOTE: compute
is part of Spark Core's RDD
Contract to compute a partition (in a TaskContext
).
compute
...FIXME
=== [[resolveTable]] resolveTable
Method
[source, scala]¶
resolveTable(options: JDBCOptions): StructType¶
resolveTable
...FIXME
NOTE: resolveTable
is used exclusively when JDBCRelation
is requested for the <
=== [[scanTable]] Creating RDD for Distributed Data Scan -- scanTable
Object Method
[source, scala]¶
scanTable( sc: SparkContext, schema: StructType, requiredColumns: Array[String], filters: Array[Filter], parts: Array[Partition], options: JDBCOptions): RDD[InternalRow]
scanTable
takes the <
scanTable
finds the corresponding JDBC dialect (per the url
option) and requests it to quote the column identifiers in the input requiredColumns
.
scanTable
uses the JdbcUtils
object to createConnectionFactory
and <schema
to include the input requiredColumns
only.
In the end, scanTable
creates a new <
NOTE: scanTable
is used exclusively when JDBCRelation
is requested to <
Creating Instance¶
JDBCRDD
takes the following to be created:
- [[sc]]
SparkContext
- [[getConnection]] Function to create a
Connection
(() => Connection
) - [[schema]] Schema
- [[columns]] Array of column names
- [[filters]] Array of Filter predicates
- [[partitions]] Array of Spark Core's
Partitions
- [[url]] Connection URL
- [[options]] JDBCOptions
=== [[getPartitions]] getPartitions
Method
[source, scala]¶
getPartitions: Array[Partition]¶
NOTE: getPartitions
is part of Spark Core's RDD
Contract to...FIXME
getPartitions
simply returns the <JDBCRDD
was created with).
=== [[pruneSchema]] pruneSchema
Internal Method
[source, scala]¶
pruneSchema(schema: StructType, columns: Array[String]): StructType¶
pruneSchema
...FIXME
NOTE: pruneSchema
is used when...FIXME
=== [[compileFilter]] Converting Filter Predicate to SQL Expression -- compileFilter
Object Method
[source, scala]¶
compileFilter(f: Filter, dialect: JdbcDialect): Option[String]¶
compileFilter
...FIXME
[NOTE]¶
compileFilter
is used when:
JDBCRelation
is requested to <>