JDBCRDD¶
JDBCRDD is a RDD of InternalRows that represents a structured query over a table in a database accessed via JDBC.
Note
JDBCRDD represents a SELECT requiredColumns FROM table query.
JDBCRDD is <JDBCRDD is requested to <JDBCRelation is requested to build a scan).
[[internal-registries]] .JDBCRDD's Internal Properties (e.g. Registries, Counters and Flags) [cols="1,2",options="header",width="100%"] |=== | Name | Description
| columnList | [[columnList]] Column names
Used when...FIXME
| filterWhereClause | [[filterWhereClause]] <WHERE clause
Used when...FIXME |===
=== [[compute]] Computing Partition (in TaskContext) -- compute Method
[source, scala]¶
compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]¶
NOTE: compute is part of Spark Core's RDD Contract to compute a partition (in a TaskContext).
compute...FIXME
=== [[resolveTable]] resolveTable Method
[source, scala]¶
resolveTable(options: JDBCOptions): StructType¶
resolveTable...FIXME
NOTE: resolveTable is used exclusively when JDBCRelation is requested for the <
=== [[scanTable]] Creating RDD for Distributed Data Scan -- scanTable Object Method
[source, scala]¶
scanTable( sc: SparkContext, schema: StructType, requiredColumns: Array[String], filters: Array[Filter], parts: Array[Partition], options: JDBCOptions): RDD[InternalRow]
scanTable takes the <
scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.
scanTable uses the JdbcUtils object to createConnectionFactory and <schema to include the input requiredColumns only.
In the end, scanTable creates a new <
NOTE: scanTable is used exclusively when JDBCRelation is requested to <
Creating Instance¶
JDBCRDD takes the following to be created:
- [[sc]]
SparkContext - [[getConnection]] Function to create a
Connection(() => Connection) - [[schema]] Schema
- [[columns]] Array of column names
- [[filters]] Array of Filter predicates
- [[partitions]] Array of Spark Core's
Partitions - [[url]] Connection URL
- [[options]] JDBCOptions
=== [[getPartitions]] getPartitions Method
[source, scala]¶
getPartitions: Array[Partition]¶
NOTE: getPartitions is part of Spark Core's RDD Contract to...FIXME
getPartitions simply returns the <JDBCRDD was created with).
=== [[pruneSchema]] pruneSchema Internal Method
[source, scala]¶
pruneSchema(schema: StructType, columns: Array[String]): StructType¶
pruneSchema...FIXME
NOTE: pruneSchema is used when...FIXME
=== [[compileFilter]] Converting Filter Predicate to SQL Expression -- compileFilter Object Method
[source, scala]¶
compileFilter(f: Filter, dialect: JdbcDialect): Option[String]¶
compileFilter...FIXME
[NOTE]¶
compileFilter is used when:
JDBCRelationis requested to <>