Skip to content
- How do you decide between normalized and denormalized models for analytics use cases?
- Walk through how you would model customers, orders, and events for reporting.
- What principles do you use when designing fact and dimension tables?
- How do you manage slowly changing dimensions in practice?
- What should be included in a data contract between producers and consumers?
- How do you handle schema evolution without breaking downstream workloads?
- When do you prefer batch processing versus streaming pipelines?
- Describe the key design choices for an event-driven ingestion pipeline.
- How do you design DAGs that are observable, idempotent, and easy to recover?
- What is your approach to retry logic and dead-letter handling?
- How do you diagnose and improve slow transformation jobs?
- What methods do you use to reduce warehouse compute cost while keeping SLA targets?
- What layers of data quality checks do you typically implement?
- How do you detect and respond to silent data corruption?
- What signals should trigger alerts for a critical data pipeline?
- How do you separate actionable alerts from noise?
- Tell me about a data incident you handled and how you reduced recurrence risk.
- What should a strong data postmortem include?
- How do you evaluate tradeoffs between managed and self-hosted data infrastructure?
- What criteria matter most when selecting orchestration and transformation tools?
- How do you improve local development and testing workflows for data engineers?
- How do you ensure pipeline changes are safely validated before production deploys?
- How do you implement access controls for sensitive datasets?
- What controls do you use for lineage, auditability, and compliance requirements?
- How do you gather requirements from analysts, product teams, and data scientists?
- How do you push back when a request conflicts with platform standards or quality goals?
- How do you prioritize foundational data platform work against urgent business asks?
- Describe a time you balanced short-term reporting needs with long-term architecture health.
- How do you communicate data reliability and freshness expectations to non-engineering teams?
- What artifacts do you produce so consumers can trust and understand datasets?
- What kind of data platform environment helps you deliver your best work?
- How do you define success in your first 90 days?
- What questions do you have about our data stack, governance model, or team structure?