Intermediate
SQL Advanced for Data Engineers
Window functions, CTEs, query optimization, and execution plans
⏱️ 50 min read
📅 Updated Jan 2025
👤 By DataLearn Team
Mode Baca Pemula
Anggap SQL advanced sebagai "alat utama transformasi data". Fokus baca:
- Window function untuk ranking, running total, dan dedup
- CTE untuk query kompleks yang tetap mudah dibaca
- Cara membaca execution plan agar query lebih cepat
Kamus istilah: DE-GLOSSARY.md
Prasyarat Ringan
- Paham SELECT, JOIN, GROUP BY, dan HAVING dasar
- Pernah menulis query analitik sederhana
- Tahu query bisa lambat karena full scan atau join buruk
Istilah Penting (3 Lapis)
Istilah: Window Function
Definisi awam: Fungsi yang menghitung nilai sambil tetap melihat baris lain di kelompok yang sama.
Definisi teknis: Operasi analitik SQL dengan OVER(PARTITION BY ... ORDER BY ...) tanpa menghilangkan detail row.
Contoh praktis: Hitung ranking penjualan per kota tanpa kehilangan detail transaksi.
Istilah: Explain Plan
Definisi awam: Peta langkah mesin database saat menjalankan query.
Definisi teknis: Representasi eksekusi query (scan, join, sort) untuk diagnosis performa.
Contoh praktis: Dari plan terlihat tabel besar di-full scan, lalu diperbaiki dengan partisi/index.
Why Advanced SQL Matters
SQL adalah bahasa universal data. Data engineers yang menguasai SQL advanced dapat:
- Write efficient transformations directly in the warehouse
- Avoid pulling data to Python unnecessarily
- Optimize query performance
- Solve complex data problems elegantly
Common Table Expressions (CTEs)
CTEs make complex queries readable and reusable:
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(amount) as total_sales
FROM orders
GROUP BY 1
),
growth AS (
SELECT
month,
total_sales,
LAG(total_sales) OVER (ORDER BY month) as prev_month,
(total_sales - LAG(total_sales) OVER (ORDER BY month))
/ LAG(total_sales) OVER (ORDER BY month) * 100 as growth_pct
FROM monthly_sales
)
SELECT * FROM growth;
Window Functions
Window functions perform calculations across rows related to the current row:
Common Window Functions
| Function |
Description |
Example |
| ROW_NUMBER() |
Unique sequential number |
Ranking without ties |
| RANK() |
Rank with gaps for ties |
1, 2, 2, 4... |
| DENSE_RANK() |
Rank without gaps |
1, 2, 2, 3... |
| LEAD()/LAG() |
Access next/previous row |
Period-over-period calc |
| SUM() OVER |
Running totals |
Cumulative sales |
SELECT *
FROM (
SELECT
category,
product_name,
sales,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY sales DESC
) as rank
FROM product_sales
) ranked
WHERE rank <= 3;
Query Optimization
Reading Execution Plans
📊 Key Metrics
- Cost: Relative resource consumption
- Rows: Estimated vs actual
- Time: Execution duration
Optimization Techniques
| Technique |
When to Use |
| Indexes |
Frequently filtered/joined columns |
| Partitioning |
Large tables, time-series data |
| Column pruning |
SELECT only needed columns |
| Predicate pushdown |
Filter early, filter often |
SELECT * FROM orders
WHERE YEAR(order_date) = 2024;
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';
Recursive CTEs
For hierarchical data (org charts, bill of materials):
WITH RECURSIVE employee_hierarchy AS (
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, h.level + 1
FROM employees e
JOIN employee_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM employee_hierarchy;
Advanced Joins
Self-Join
SELECT
e.name as employee,
m.name as manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Lateral Join
SELECT c.category_name, top_products.*
FROM categories c
LEFT JOIN LATERAL (
SELECT product_name, sales
FROM products p
WHERE p.category_id = c.id
ORDER BY sales DESC
LIMIT 3
) top_products ON true;
Decision Framework: SQL Optimization Strategy
| Decision Point |
Pilih Opsi A Jika... |
Pilih Opsi B Jika... |
| Window functions vs Self-join |
Butuh ranking/running totals secara ekspresif |
Engine terbatas dan query plan self-join lebih efisien |
| CTE vs Temp table |
Transformasi logika bertahap dan readable |
Butuh materialisasi intermediate untuk performa |
| Denormalized mart vs Raw query on source |
Dashboard berulang dan SLA cepat |
Analisis ad-hoc sesekali dengan kebutuhan fleksibel |
Failure Modes & Anti-Patterns
Anti-Patterns SQL
- SELECT * in production models: scan data berlebih dan schema drift risk.
- No predicate pushdown awareness: query mahal tanpa filter awal.
- Over-nested CTE: query sulit debug dan optimizer tidak optimal.
- Unbounded joins: cardinality blow-up menyebabkan cost spike.
Production Readiness Checklist
Checklist SQL Production
- Query plan (`EXPLAIN`) ditinjau untuk model kritikal.
- Kolom yang dipilih explicit, bukan wildcard.
- Join keys dan grain table tervalidasi.
- Query punya tests untuk edge-case data.
- Cost/performance baseline didokumentasikan.
- SQL style guide dipatuhi lintas tim.
✏️ Exercise: Complex SQL Query
Tulis query untuk customer cohort analysis:
- Find first purchase month for each customer (cohort)
- Calculate retention by cohort over time
- Use window functions for period-over-period
🎯 Quick Quiz
1. Fungsi ROW_NUMBER() digunakan untuk?
A. Menjumlahkan nilai
B. Memberi nomor unik pada setiap row
C. Mengurutkan data
D. Menghapus duplicate
2. Kenapa hindari fungsi di WHERE clause?
A. Terlalu panjang
B. Tidak bisa pakai index
C. Error syntax
D. Lambat proses
3. Recursive CTE cocok untuk?
A. Simple aggregation
B. Hierarchical data
C. Date calculations
D. String manipulation
Kesimpulan
Advanced SQL adalah force multiplier untuk data engineers. Dengan window functions, CTEs, dan optimization techniques, kamu dapat solve complex problems efficiently di database level.
🎯 Key Takeaways
- Use CTEs for readability and recursion
- Master window functions for analytics
- Always check execution plans
- Filter early, use indexes wisely
- Push computation to the database
📚 References & Resources
Primary Sources
- SQL for Data Scientists - Renee M. P. Teate (Wiley, 2021)
- Learning SQL - Alan Beaulieu (O'Reilly, 2020)
3rd Edition: Master SQL Fundamentals
- SQL Antipatterns - Bill Karwin (Pragmatic, 2010)
Official Documentation
Articles & Guides