Skip to content

Day 8: dbt Fundamentals Course (part 2)

This is Part 2 of the notes from the dbt Fundamentals course.

Cloud Data Warehouses and ELT

Traditional Data Teams (ETL projects) consist of the following roles:

  1. Data Engineers who maintains data infrastructure and the ETL process for creating tables and views
  2. Data Analysts focus on querying tables and views to drive business insights for stakeholders

Cloud Data Warehouses / Data Lakes made ELT and a new role at the forefront of data projects:

  1. Analytics Engineer who transforms raw data (through layers) up to the BI layer
    1. Think of Databricks / Delta Lake's Medallion Architecture
    2. Think of bronze, silver and gold tables
  2. In charge of T in ELT
    1. Data Engineer can focus on the EL part of ELT (i.e. extracting data from sources and loading data into data warehouse) and maintaining data infrastructure
    2. Data Analyst works closely with Analytics Engineer to deliver final (gold) tables that can be queried by BI tools (for faster business decisions)

ELT Influencers

  1. Modern cloud-based data warehouses with scalable storage and compute
  2. Many data pipeline/extraction tools
  3. Self-service business intelligence tools increased the ability for stakeholders to access and analyze data

Modern Data Team

Modern Data Team (ELT projects) consists of the following roles:

  1. Data Engineer
  2. Analytics Engineer
  3. Data Analyst

dbt Workflow

  1. Firstly, extract and load raw data into a data warehouse using EL tools (loaders)
  2. Data warehouse could be Snowflake, Redshift, BigQuery or Databricks
    • these were mentioned explicitly
  3. dbt transformations create gold tables for BI tools, ML models, Operational Analytics
  4. Only select statements
    • No need for DDLs, DMLs or anything but selects
  5. dbt DAG for data lineage

dbt Cloud

  1. Welcome to dbt Cloud!
  2. Data Warehouses listed explicitly while setting up the first Analytics project
    1. PostgreSQL
    2. Redshift
    3. Snowflake
    4. BigQuery
    5. Apache Spark
    6. Databricks

Summary

The Module 2 finishes with such an exceptionally compehensive Review that it'd be a shame to copy it here. Go and read it yourself!