Skip to content

Data Engineer

  • How do you decide between normalized and denormalized models for analytics use cases?
  • Walk through how you would model customers, orders, and events for reporting.
  • What principles do you use when designing fact and dimension tables?
  • How do you manage slowly changing dimensions in practice?
  • What should be included in a data contract between producers and consumers?
  • How do you handle schema evolution without breaking downstream workloads?
  • When do you prefer batch processing versus streaming pipelines?
  • Describe the key design choices for an event-driven ingestion pipeline.
  • How do you design DAGs that are observable, idempotent, and easy to recover?
  • What is your approach to retry logic and dead-letter handling?
  • How do you diagnose and improve slow transformation jobs?
  • What methods do you use to reduce warehouse compute cost while keeping SLA targets?
  • What layers of data quality checks do you typically implement?
  • How do you detect and respond to silent data corruption?
  • What signals should trigger alerts for a critical data pipeline?
  • How do you separate actionable alerts from noise?
  • Tell me about a data incident you handled and how you reduced recurrence risk.
  • What should a strong data postmortem include?
  • How do you evaluate tradeoffs between managed and self-hosted data infrastructure?
  • What criteria matter most when selecting orchestration and transformation tools?
  • How do you improve local development and testing workflows for data engineers?
  • How do you ensure pipeline changes are safely validated before production deploys?
  • How do you implement access controls for sensitive datasets?
  • What controls do you use for lineage, auditability, and compliance requirements?
  • How do you gather requirements from analysts, product teams, and data scientists?
  • How do you push back when a request conflicts with platform standards or quality goals?
  • How do you prioritize foundational data platform work against urgent business asks?
  • Describe a time you balanced short-term reporting needs with long-term architecture health.
  • How do you communicate data reliability and freshness expectations to non-engineering teams?
  • What artifacts do you produce so consumers can trust and understand datasets?
  • What kind of data platform environment helps you deliver your best work?
  • How do you define success in your first 90 days?
  • What questions do you have about our data stack, governance model, or team structure?