Master the end-to-end data lifecycle: from generation and storage to ingestion, transformation, and serving. Based on "Fundamentals of Data Engineering" by Joe Reis & Matt Housley.
This learning path is structured based on the Data Engineering Lifecycle framework by Joe Reis & Matt Housley (O'Reilly, 2022).
The Data Engineering Lifecycle
What is data engineering, roles, skills, and data maturity model
The 5 stages: Generation, Storage, Ingestion, Transformation, Serving
9 principles of good architecture, monolith vs microservices, data mesh
Team capabilities, TCO, cloud vs on-prem, build vs buy decisions
OLTP databases, APIs, IoT, message queues, CDC
Data warehouse, data lake, lakehouse, partitioning, schema evolution
Batch vs streaming, CDC, ETL vs ELT, error handling
Normalization, dbt, materialized views, query optimization
DAG design, workflow patterns, monitoring, best practices
Analytics serving, ML serving, reverse ETL, data products
Kleppmann: Replication, partitioning, ACID, consistency models
Densmore: Patterns, anti-patterns, testing, idempotency
CI/CD for data, data quality testing, lineage, data catalog
Encryption, access control, PII handling, compliance
AWS, Google Cloud, Azure for data engineering
Spark architecture, RDDs, DataFrames, Spark SQL, optimization
Kafka architecture, producers/consumers, stream processing
Advanced Python, pandas, boto3, working with APIs
Window functions, CTEs, query optimization, execution plans
dbt models, tests, documentation, best practices
Observability, alerting, SLA management, incident response
Emerging trends, AI/ML integration, data mesh, modern data stack
Compatibility policy, versioning, CI checks, and safe rollout strategy
Rerun-safe design, dedup strategies, and realistic consistency guarantees
Metric design, tiered SLOs, and actionable alerting
Storage anti-patterns, metadata overhead, and remediation workflow
DAG boundaries, anti-pattern cleanup, and operation guardrails
Slim CI, incremental pitfalls, and enterprise dbt operations
Stack selection by latency, cost, team size, and compliance constraints
Build a complete pipeline: Extract from API, transform with Python/Pandas, load to PostgreSQL.
View Project →Setup Kafka for streaming, process with Spark Streaming, visualize with Grafana.
View Project →Migrate from on-premise database to cloud data warehouse (BigQuery/Snowflake).
View Project →"Fundamentals of Data Engineering" by Joe Reis & Matt Housley (O'Reilly).
Learn More →