Intermediate
Data Serving & APIs
The final stage: Making data accessible to consumers
⏱️ 35 min read
📅 Updated Jan 2025
👤 By DataLearn Team
Mode Baca Pemula
Anggap data serving sebagai "cara kirim data ke pengguna". Fokus baca:
- Siapa konsumennya dan kebutuhan latency mereka
- Pola serving yang tepat untuk BI, API, atau reverse ETL
- Cara menjaga kontrak data agar tidak sering breaking change
Kamus istilah: DE-GLOSSARY.md
Prasyarat Ringan
- Paham output pipeline harus dipakai tim lain/aplikasi
- Tahu konsep endpoint API dan query dasar
- Pernah lihat kebutuhan dashboard harian vs API real-time
Istilah Penting (3 Lapis)
Istilah: Data Contract
Definisi awam: Janji format data antara pembuat dan pemakai data.
Definisi teknis: Kesepakatan skema, semantik, SLA, dan versioning untuk mencegah breaking changes.
Contoh praktis: Field customer_id wajib string non-null, perubahan harus lewat versi API baru.
Istilah: Reverse ETL
Definisi awam: Kirim data dari warehouse balik ke tools operasional.
Definisi teknis: Sinkronisasi terjadwal dari model analitik ke sistem SaaS (CRM, ads, support tools).
Contoh praktis: Segment "high churn risk" dikirim ke CRM agar tim CS bisa follow-up.
The Data Serving Stage
Data Serving adalah stage terakhir dari Data Engineering Lifecycle.
Semua pekerjaan sebelumnya (generate, store, ingest, transform) tidak ada artinya
jika data tidak dapat diakses oleh consumers.
💡 Key Principle
"Data engineers don't just build pipelines - they build products that deliver data."
Data Consumers and Their Needs
| Consumer |
Access Pattern |
Latency Requirement |
| Business Analysts |
SQL queries, dashboards |
Seconds to minutes |
| Data Scientists |
Feature stores, notebooks |
Minutes |
| Applications |
APIs, low-latency queries |
Milliseconds |
| External Partners |
Secure APIs, data exports |
Varies |
Data Serving Patterns
📊 Analytics Serving
Tools: Tableau, Looker, Metabase
Store: Data Warehouse
Pre-aggregated tables for fast dashboards
🤖 ML Feature Store
Tools: Feast, Tecton, SageMaker
Store: Key-value, vector DB
Low-latency feature retrieval
🔌 Operational APIs
Tools: FastAPI, GraphQL
Store: PostgreSQL, Redis
Application-facing endpoints
🔄 Reverse ETL
Tools: Hightouch, Census
Dest: Salesforce, HubSpot
Sync warehouse to SaaS tools
API Design for Data
REST API Best Practices
from fastapi import FastAPI, Query
from pydantic import BaseModel
app = FastAPI()
class SalesMetrics(BaseModel):
date: str
revenue: float
orders: int
@app.get("/api/v1/sales")
async def get_sales(
start_date: str = Query(..., description="YYYY-MM-DD"),
end_date: str = Query(..., description="YYYY-MM-DD"),
region: str = Query(None)
):
return {"data": [...]}
@app.get("/api/v1/transactions")
async def get_transactions(
cursor: str = None,
limit: int = Query(100, le=1000)
):
return {
"data": [...],
"next_cursor": "abc123"
}
Reverse ETL
Traditional ETL brings data into the warehouse. Reverse ETL pushes data out to operational tools.
🔄 Reverse ETL Use Cases
- Customer 360: Sync unified profile to Salesforce
- Marketing: Send churn prediction to Braze
- Support: Surface VIP customers in Zendesk
- Product: Feature flags based on user segments
Performance Optimization
| Technique |
Use Case |
Implementation |
| Caching |
Repeated queries |
Redis, materialized views |
| Pre-aggregation |
Dashboard metrics |
Rollup tables, cubes |
| Partitioning |
Large tables |
Date/region partitions |
| Indexing |
Point lookups |
B-tree, inverted indexes |
Data Products
Treat your data outputs as products with:
- Clear contracts: Schema guarantees, SLAs
- Documentation: What, why, how to use
- Versioning: API versioning, schema evolution
- Support: Owner, contact, troubleshooting guide
Decision Framework: Data Serving Patterns
| Decision Point |
Pilih Opsi A Jika... |
Pilih Opsi B Jika... |
| BI Serving vs API Serving |
Consumer utama analis dan business user |
Consumer utama aplikasi produk/operasional |
| Pre-compute vs On-demand query |
Latency ketat dan pola query berulang |
Query ad-hoc beragam dan kebutuhan fleksibel |
| Pull API vs Reverse ETL push |
Sistem downstream bisa query kapan saja |
Perlu sinkronisasi aktif ke CRM/ads/operational tools |
Failure Modes & Anti-Patterns
Anti-Patterns di Layer Serving
- Unclear metric definitions: angka berbeda antar dashboard/API.
- No caching strategy: query mahal berulang menekan warehouse.
- Serving stale data without flag: user tidak tahu freshness data.
- Missing contract tests: perubahan schema mematahkan client.
Production Readiness Checklist
Checklist Data Serving
- Metric definition dan semantic layer disepakati lintas tim.
- SLA latency dan freshness dipublikasikan ke consumer.
- Contract/API schema tests aktif di CI.
- Caching + invalidation plan terdokumentasi.
- Access control untuk endpoint/dataset sensitif diterapkan.
- Monitoring usage, error rate, dan cost per endpoint aktif.
✏️ Exercise: Design a Data Product
Design a Customer Lifetime Value API for marketing team:
- Define the API contract (endpoints, request/response)
- Choose storage and serving technology
- Design for 100ms p95 latency
- Plan for 10K requests/minute
- Define SLA and error handling
🎯 Quick Quiz
1. Reverse ETL berbeda dari ETL tradisional karena?
A. Menggunakan tools yang berbeda
B. Memindahkan data dari warehouse ke operational tools
C. Lebih cepat daripada ETL
D. Hanya untuk data real-time
2. Teknik apa yang cocok untuk dashboard dengan query berulang?
A. Full table scan
B. Caching dan pre-aggregation
C. Sequential reads
D. Dynamic SQL generation
3. Latency requirement untuk operational APIs biasanya?
A. Minutes
B. Seconds
C. Milliseconds
D. Hours
Kesimpulan
Data Serving adalah stage yang sering diabaikan tapi sangat penting.
Data engineers harus memahami kebutuhan berbagai consumers dan
memilih teknologi serta patterns yang tepat untuk melayani mereka.
🎯 Key Takeaways
- Different consumers need different access patterns
- Reverse ETL brings data back to operational tools
- API design follows REST principles with data considerations
- Treat data outputs as products with contracts and SLAs
📚 References & Resources
Primary Sources
- Fundamentals of Data Engineering - Joe Reis & Matt Housley (O'Reilly, 2022)
Chapter 14: Data Serving and the Consumer Layer
- Designing Data-Intensive Applications - Martin Kleppmann (O'Reilly, 2017)
Chapter 4: Encoding and Evolution (Data Contracts, API Design)
- Designing Web APIs - Brenda Jin, Saurabh Sahni, Amir Shevat (O'Reilly, 2018)
REST API Design Principles
Official Documentation
Articles & Guides