How should I structure my data team - centralised or embedded?

Both models have trade-offs. A centralised data team provides consistency in tooling, standards, and data models, but can become a bottleneck for product teams. Embedded data engineers move faster for their product area but risk creating data silos and inconsistent practices. Many organisations adopt a hybrid model with a central platform team maintaining shared infrastructure and standards, while embedded engineers focus on domain-specific pipelines.

How do I measure the productivity of a data engineering team?

Focus on outcome metrics rather than output metrics. Pipeline reliability (uptime, SLA adherence), data quality scores, time-to-delivery for new data products, and the ratio of proactive improvement work to reactive firefighting are more meaningful than lines of code or number of pipelines built. Also measure stakeholder satisfaction through regular surveys.

When should we invest in a data platform versus building custom pipelines?

Invest in a data platform when you notice patterns of duplication across pipelines, when onboarding new data sources takes too long, or when operational overhead is consuming too much of the team's time. A well-designed platform provides abstractions that make common tasks simple while still allowing custom solutions for unique requirements.

Data Engineering Team Management: Reliability & Quality

The CFO's revenue dashboard showed the wrong number this morning. Your pipeline failed silently overnight, a downstream model trained on stale data, and three teams made decisions based on figures that were 24 hours out of date. Nobody trusts the data now, and rebuilding that trust is ten times harder than building the pipeline in the first place. Data engineering sits where engineering meets business intelligence, and when your pipelines stumble, the whole organisation feels it.

Building and Maintaining Reliable Data Pipelines

Data pipeline reliability is the foundation of everything your data team does. When pipelines fail or produce incorrect results, downstream analytics, dashboards, and ML models are all affected. Invest in monitoring, alerting, and automated recovery mechanisms that catch failures early and minimise the impact on downstream consumers.

Design pipelines for idempotency and recoverability. Every pipeline should be safe to re-run without creating duplicate data or corrupting existing results. This makes recovery from failures straightforward and reduces the operational burden on your team. Use checkpointing and incremental processing where possible to minimise reprocessing time.

Establish SLAs for your most critical data pipelines and communicate them to stakeholders. If the daily revenue dashboard needs to be updated by 9 AM, work backwards from that requirement to set pipeline completion targets, build buffer for retries, and create escalation procedures for when SLAs are at risk.

Implement comprehensive monitoring and alerting for all production pipelines
Design every pipeline to be idempotent - safe to re-run without side effects
Define and communicate clear SLAs for critical data deliverables
Build automated recovery and retry mechanisms to reduce manual intervention

Establishing Data Quality Standards

Data quality is the most persistent challenge in data engineering. Bad data leads to bad decisions, and once stakeholders lose trust in the data, rebuilding that trust is extraordinarily difficult. Implement data quality checks at every stage of your pipeline - ingestion, transformation, and delivery.

Define quality dimensions that matter for your organisation: completeness, accuracy, timeliness, consistency, and uniqueness. Build automated checks against these dimensions and alert when quality degrades. Tools like Great Expectations, dbt tests, or custom validation frameworks can automate much of this checking.

Create a data quality incident process similar to your production incident process. When data quality issues are discovered, document the root cause, the impact on downstream consumers, and the remediation steps. Track data quality metrics over time and set improvement targets.

Managing Data Team Stakeholders

Data teams serve many stakeholders - analysts, data scientists, product managers, executives, and other engineering teams. Managing competing priorities and requests is one of the biggest challenges a data engineering manager faces. Without a clear intake process, the team becomes overwhelmed with ad-hoc requests.

Establish a formal request process with clear criteria for prioritisation. Distinguish between new data pipeline requests, modifications to existing pipelines, data quality issues, and ad-hoc data requests. Each category should have different SLAs and prioritisation criteria.

Invest in self-service capabilities to reduce the demand on your team. Well-documented data catalogues, self-service query tools, and clear data models enable analysts and data scientists to answer their own questions without requiring data engineering support for every request.

Building a Well-Rounded Data Team

Data engineering requires a broad skill set - SQL, distributed systems, cloud infrastructure, data modelling, and increasingly, knowledge of ML infrastructure. Hire for strong engineering fundamentals and the ability to learn, rather than expertise in a specific tool or framework that may change.

Cross-train your team across the data stack. Engineers who only know how to build batch pipelines should learn streaming. Engineers focused on ingestion should understand the downstream analytics use cases. This cross-training improves the team's resilience and gives individuals broader career development opportunities.

Stay current with the rapidly evolving data engineering landscape but resist the urge to adopt every new tool. Evaluate new technologies against your specific needs and the total cost of ownership, including migration effort, learning curve, and operational overhead.

Key Takeaways

Invest in pipeline monitoring, idempotency, and automated recovery to ensure reliability
Implement systematic data quality checks and treat data quality issues with the same severity as production incidents
Establish clear stakeholder intake processes and invest in self-service to manage competing priorities
Build a well-rounded team with strong engineering fundamentals and cross-training across the data stack

Frequently Asked Questions

How should I structure my data team - centralised or embedded?: Both models have trade-offs. A centralised data team provides consistency in tooling, standards, and data models, but can become a bottleneck for product teams. Embedded data engineers move faster for their product area but risk creating data silos and inconsistent practices. Many organisations adopt a hybrid model with a central platform team maintaining shared infrastructure and standards, while embedded engineers focus on domain-specific pipelines.
How do I measure the productivity of a data engineering team?: Focus on outcome metrics rather than output metrics. Pipeline reliability (uptime, SLA adherence), data quality scores, time-to-delivery for new data products, and the ratio of proactive improvement work to reactive firefighting are more meaningful than lines of code or number of pipelines built. Also measure stakeholder satisfaction through regular surveys.
When should we invest in a data platform versus building custom pipelines?: Invest in a data platform when you notice patterns of duplication across pipelines, when onboarding new data sources takes too long, or when operational overhead is consuming too much of the team's time. A well-designed platform provides abstractions that make common tasks simple while still allowing custom solutions for unique requirements.

Explore Data Team Management Resources

Access my field guide for data engineering leadership, including pipeline design patterns, data quality frameworks, and stakeholder management templates.

Learn More

Data Engineering Team Management: Reliability & Quality

Building and Maintaining Reliable Data Pipelines

Establishing Data Quality Standards

Managing Data Team Stakeholders

Building a Well-Rounded Data Team

Key Takeaways

Frequently Asked Questions

Explore Data Team Management Resources

Related Articles

ML Team Management: Research to Production Guide

Infrastructure Team Management: Vision, Reliability & ROI

Shift-Left Security for Engineering Teams

QA Team Transformation: From Manual Testing to Automation

Frontend Engineering Team Management Guide

Backend Engineering Team Management Guide