Intermediate

dbt (Data Build Tool) Deep Dive

dbt models, tests, documentation, and best practices

⏱️ 45 min read πŸ“… Updated Jan 2025 πŸ‘€ By DataLearn Team

Mode Baca Pemula

Anggap dbt sebagai "framework transformasi SQL yang terstruktur". Fokus baca:

  1. Susunan model staging, intermediate, dan marts
  2. Test otomatis untuk kualitas data
  3. Cara deploy aman antara dev dan production

Kamus istilah: DE-GLOSSARY.md

Prasyarat Ringan

Istilah Penting (3 Lapis)

Istilah: Materialization

Definisi awam: Cara dbt menyimpan hasil model.

Definisi teknis: Konfigurasi output model sebagai view, table, incremental, atau ephemeral.

Contoh praktis: Model dashboard dipilih table agar query BI cepat dan stabil.

Istilah: Slim CI

Definisi awam: Menjalankan test hanya pada model yang berubah.

Definisi teknis: Strategi CI dbt berbasis state untuk mempercepat validasi perubahan.

Contoh praktis: PR kecil hanya mengetes model terkait, bukan seluruh ratusan model.

What is dbt?

dbt (data build tool) enables data analysts and engineers to transform data in their warehouses by simply writing SELECT statements. dbt handles the boilerplateβ€”turning these statements into tables and views.

🎯 dbt Philosophy

"The T in ELT doesn't stand for Toolβ€”it stands for Transform, and that should be done with code."

Why dbt?

Project Structure

my_dbt_project/ β”œβ”€β”€ models/ β”‚ β”œβ”€β”€ staging/ # Source-aligned models β”‚ β”‚ β”œβ”€β”€ _sources.yml β”‚ β”‚ β”œβ”€β”€ stg_orders.sql β”‚ β”‚ └── stg_customers.sql β”‚ β”œβ”€β”€ intermediate/ # Business logic β”‚ β”‚ β”œβ”€β”€ int_orders_enriched.sql β”‚ β”‚ └── int_customer_metrics.sql β”‚ └── marts/ # Business-facing β”‚ β”œβ”€β”€ _models.yml β”‚ β”œβ”€β”€ fct_orders.sql β”‚ └── dim_customers.sql β”œβ”€β”€ macros/ # Reusable SQL β”œβ”€β”€ tests/ # Custom tests β”œβ”€β”€ snapshots/ # SCD Type 2 β”œβ”€β”€ seeds/ # Static data └── dbt_project.yml

Model Configuration

Materializations

Type Use Case Behavior
view Light transformations Rebuilt on each run
table Heavy transformations Full rebuild
incremental Large tables, append-only Only new rows processed
ephemeral Intermediate CTEs Not persisted
-- Config block in model {{ config( materialized='incremental', unique_key='order_id', incremental_strategy='merge' ) }} SELECT * FROM {{ ref('stg_orders') }} {% if is_incremental() %} WHERE created_at > ( SELECT MAX(created_at) FROM {{ this }} ) {% endif %}

Testing in dbt

-- models/marts/schema.yml version: 2 models: - name: fct_orders description: "One record per order" columns: - name: order_id description: "Primary key" tests: - unique - not_null - name: customer_id tests: - not_null - relationships: to: ref('dim_customers') field: customer_id - name: amount tests: - not_null - dbt_utils.accepted_range: min_value: 0 - name: status tests: - accepted_values: values: ['pending', 'shipped', 'delivered', 'cancelled']

Custom Tests with Macros

-- macros/test_positive_values.sql {% macro test_positive_values(model, column_name) %} SELECT {{ column_name }} FROM {{ model }} WHERE {{ column_name }} <= 0 {% endmacro %}

Documentation and Lineage

πŸ“– Documentation Best Practices

Advanced dbt Features

Snapshots (SCD Type 2)

{% snapshot customers_snapshot %} {{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at' ) }} SELECT * FROM {{ source('crm', 'customers') }} {% endsnapshot %}

Packages

-- packages.yml packages: - package: dbt-labs/dbt_utils version: 1.1.1 - package: dbt-labs/codegen version: 0.12.1 - git: "https://github.com/dbt-labs/dbt-audit-helper" revision: main

Decision Framework: dbt Implementation

Decision Point Pilih Opsi A Jika... Pilih Opsi B Jika...
table vs incremental Volume data kecil-menengah, rebuild penuh masih cepat Volume besar, SLA ketat, dan perubahan bersifat append/update terkontrol
staging granularity Source stabil dan transformasi sederhana Source beragam, butuh cleaning dan standardisasi ketat
generic tests vs custom tests Aturan kualitas umum (unique, not_null, relationships) Aturan bisnis spesifik domain (mis. margin tidak negatif)
single project vs mesh/monorepo Tim kecil dan domain masih satu Multi-domain team dengan ownership jelas per data product

Failure Modes & Anti-Patterns

Anti-Patterns pada dbt Project

Production Readiness Checklist

Checklist dbt sebelum Production

  1. Layering model jelas: staging, intermediate, marts.
  2. Semua model penting punya tests minimum (unique/not_null/relationships).
  3. CI menjalankan slim run + tests sebelum merge.
  4. Incremental model punya unique_key dan backfill strategy.
  5. Snapshots untuk entity historis penting sudah dikonfigurasi.
  6. Dokumentasi model + kolom terisi dan dipublish.
  7. Owner model per domain sudah jelas.
  8. Failure alert untuk dbt jobs sudah terhubung ke channel on-call.

✏️ Exercise: Build dbt Project

Buat dbt project untuk e-commerce:

  1. Setup sources.yml untuk raw data
  2. Create staging models (stg_orders, stg_customers)
  3. Build intermediate: int_orders_enriched
  4. Create marts: fct_orders, dim_customers
  5. Add tests: unique, not_null, relationships

🎯 Quick Quiz

1. Materialization apa untuk large append-only tables?

A. view
B. table
C. incremental
D. ephemeral

2. Fungsi {{ ref() }}?

A. Reference schema
B. Reference another model
C. Reference seed
D. Reference macro

3. Snapshots digunakan untuk?

A. Backup data
B. Track slowly changing dimensions
C. Speed up queries
D. Encrypt data

Kesimpulan

dbt mengubah cara tim data bekerjaβ€”dari spreadsheet dan manual processes ke software engineering practices. Dengan testing, documentation, dan version control, dbt brings rigor to analytics.

🎯 Key Takeaways

πŸ“š References & Resources

Primary Sources

Official Documentation

Articles & Guides