Skip to content

Demo: Using JDBC Data Source to Access PostgreSQL

This demo shows how to use JDBC Data Source to load data from PostgreSQL. These steps should be equally applicable to any relational database that allows access using a JDBC driver.

Start Postgres Instance

docker run --name postgres-demo -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d postgres
docker ps

Download JDBC Driver

Download the most current version of PostgreSQL JDBC Driver (e.g. PostgreSQL JDBC 4.2 Driver, 42.2.19).

Use an environment variable for the path of the jar file.

JDBC_DRIVER=/my/path/postgresql-42.2.11.jar

spark-shell

$SPARK_HOME/bin/spark-shell --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10
Branch HEAD
Compiled by user ubuntu on 2021-02-22T01:33:19Z
Revision 1d550c4e90275ab418b9161925049239227f3dc9
Url https://github.com/apache/spark
Type --help for more information.
$SPARK_HOME/bin/spark-shell --driver-class-path $JDBC_DRIVER --jars $JDBC_DRIVER
val sampledata = spark.range(5)
sampledata.write
  .format("jdbc")
  .option("url", "jdbc:postgresql:postgres")
  .option("dbtable", "nums")
  .option("user", "postgres")
  .option("password", "mysecretpassword")
  .save
val nums = spark.read
  .format("jdbc")
  .option("url", "jdbc:postgresql:postgres")
  .option("dbtable", "nums")
  .option("user", "postgres")
  .option("password", "mysecretpassword")
  .load
nums.show

Clean Up

docker stop postgres-demo
docker stop postgres-demo

That's it. Congratulations!