+

Data Feast Weekly

Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!

Federated Modeling: When and Why to Adopt
Federated Modeling: When and Why to Adopt

Federated Modeling: When and Why to Adopt

7 min
|
The Most Straightforward Business Case You'll Find for Data Modeling
Jan 2, 2025
Data Lifecycle
,
 and
,
  and

Originally published on

Modern Data 101 Newsletter

,

the following is a revised edition.

Before diving in, a very Happy New Year 🍻🎉 to You from Modern Data 101! As always, we’re looking forward to concocting more amazing ideas with you this year and beyond! Entering 2025 like ➡️

Happy New Year Celebration GIF by Pudgy Penguins - Find & Share on GIPHY
This piece is a community contribution from Francesco, an expert craftsman of efficient Data Architectures using various patterns. He embraces new patterns, such as Data Products, Data Mesh, and Fabric, to capitalise on data value more effectively. We highly appreciate his contribution and readiness to share his knowledge with MD101.

We actively collaborate with data experts to bring the best resources to a
9000+ strong community of data practitioners. If you have something to say on Modern Data practices & innovations, feel free to reach out!
🫴🏻 Share your ideas and work: community@moderndata101.com
*Note: Opinions expressed in contributions are not our own and are only curated by us for broader access and discussion. All submissions are vetted for quality & relevance. We keep it information-first and do not support any promotions, paid or otherwise!

TOC


Differences b/w Federated and Centralised Modeling
Drivers for the Business Case: Qualitative & Quantitative
Analysis of Economic Advantages
  • Prerequisites
  • The Math
  • Baseline Scenario
  • Improved Scenarios
Considerations and Key Takeaways

The adoption of data mesh paradigms and federated data management is currently at the peak of the hype cycle. However, it is essential to recognize that these approaches not only represent an evolution in operational models but also come with significant operational costs. These costs can lead to unsustainable efforts in the initial stages and, eventually, to financial strain if not properly managed.

The goal of this article is to analyze the key drivers to consider when quantifying the most suitable operational model for a given context. The aim is to prioritize practicality and informed decision-making, even at the expense of strict methodological purity.


Primary Differences Between Federated and Centralized Modeling

The concept of a federated data modeling team encourages a collaborative approach to managing and utilizing data across various organizational sources. While this model mitigates some of the challenges traditionally encountered by centralized teams, it also introduces new responsibilities and tasks, as outlined in the table below.

It is worth highlighting what I see as the "elephant in the room": security and governance activities, which, based on my experience, are often not actively embraced by data modeling teams. Within a Domain-Driven Design (DDD) approach, however, these responsibilities can no longer be avoided or delegated elsewhere.


Drivers for the Business Case

As with any business case, we can classify the advantages into quantitative (both economic and non-economic) and qualitative categories.

Quantitative Advantages - Economic:

  • Effort for Data Modelers: Dedicated workload due to federated responsibilities.
  • Implementation Costs: Expenses related to setting up the data product.
  • Operational Costs: Ongoing expenses to maintain the platform.

Quantitative Advantages - Non-Economic:

  • Time to Market: The speed at which data products can be delivered.
  • % Reuse: The proportion of data assets reused across domains. Reuse should be considered both as a rationalization of project scopes and as an improved efficiency in handling extraction/access requests to the platform, which often otherwise results in shadow IT.

Qualitative Advantages

  • Business User Engagement: Higher involvement of business stakeholders in data initiatives.
  • Adaptability to Change: Improved flexibility to accommodate evolving requirements.
  • Enhanced Innovation: Greater capacity to develop novel solutions within domains.

Analysis of Economic Advantages

Prerequisites

In this exercise, we will focus solely on the quantification of economic advantages, with the addition of the “% Reuse metric. The rationale is that reuse directly enables the calculation of avoided costs associated with duplicate data assets, which is critical for the success of the business case.

On the other hand, Time to Market, while important, is difficult to quantify beyond localized contexts and does not add significant value to the storytelling in this scenario. For this reason, it will not be included in this analysis, which will instead be conducted using realistic values specific to Northern Italy and cloud-native solutions with a single cloud provider.

Geographic changes or architectural shifts could, of course, alter the calculation coefficients, as we will see later. However, to maintain confidentiality, these values will be anonymized and represented using Monopoly money (or “duck-dollars” 🦆)

Some “Duck-dollars”, copyright Disney Italia

Making some math

In this section, we’ll calculate a few scenarios to assess effort, implementation, and operational costs. The goal is to highlight key takeaways ("so what") and provide insights into the practicality using the following inputs:

Data Domain Development

  • Each data domain, at its maximum level of maturity, includes up to 80 data products.
  • A transition period of 12 months is required to bring each domain into full operational status.

Onboarding Cadence

  • A new data domain is onboarded every 6 months.

Data Modeler Allocation

  • Data modelers are professionals with mid-to-high experience levels and can oversee multiple domains that are closely related in terms of business context.

Enterprise Data Modeler Role

  • A centralized enterprise data modeler is responsible for managing the data marketplace and holds end-to-end accountability.
  • This role is typically more senior compared to domain data modelers.

Domain Coupling and Reuse

  • Depending on the organization’s business architecture, domains may be either more decoupled or coupled, leading to varying probabilities of data reuse.
  • In scenarios where reuse is required, development and operational costs are assumed to be halved (although, in reality, they would be nearly negligible). This conservative assumption ensures a worst-case scenario analysis.

These initial considerations must be complemented by an assessment of the required effort, as well as the implementation and operational costs. While the implementation and operational values will naturally be expressed in duck-dollars 🦆 (and thus remain implicit), I believe the allocation framework is sufficiently generic to be broadly applicable.


Baseline Scenario

The first scenario considers the following drivers based on my observations with clients of medium maturity:

  • Project Scope: -20%: This refers to the reduction of duplications and the semantic consolidation of the data.
  • Access Requests (per domain): 6/month: This refers to the number of requests for data access (and subsequent extraction) from the platform.
  • Improved Efficiency on Access Requests: +30%: This refers to the ability to achieve a “cache hit” on existing structures already present on the platform.

These drivers plugged into the incredibly complex simulation engine (called MS Excel :) ), yield a detailed and cumulative valuation, with a breaking point occurring in the second and fourth years, as represented below in proportion to the platform's total cost (TOTEX).

Even this initial result, although quite conservative, could be significant on its own. However, it is important to emphasize that by analyzing the cost breakdown, we can observe how the investment decreases proportionally over time due to the diminishing incremental costs of the platform.

This brings us to the first key takeaway: federated modeling is not a sprint but rather a marathon.


Improved Scenarios

It is also important to emphasize that the numbers provided above represent a baseline that is easily achievable without excessive commitment. However, the model can be altered by leveraging different decision-making levers:

  • 6 access requests per month per data domain might seem low, but in fact, this indicates very poor adoption. For a medium-sized company, it's reasonable to assume that each domain serves at least 100 users, and these users could make at least one request per month (for example, data extractions in Excel at the end of the month). Therefore, it's possible to increase the number of requests to 100 by applying a corrective/deteriorating factor to the costs (not explicitly mentioned).
  • A 30% "cache hit" rate might seem high, but in reality, this means that two out of three requests do not find a match in the enterprise platform. This is a sign of poor data quality, first and foremost, and secondarily, a reflection of inadequate data modeling. To remain cautious, we can increase this number to a sufficient level (60%).

Somewhat counterintuitively, these actions are more likely to impact the scale of the return rather than the breaking event, which remains largely unaffected as it is tied to the availability of data provided by users.

Before moving to the conclusions, however, it is important to make an underlying assumption explicit: the availability within the team (in-house or otherwise) of competent and, above all, dedicated data modelers. While the results are certainly promising, they presuppose the establishment of a potentially significant workforce. If this effort is not adequately sponsored, it may fail to attract the necessary talent.

To provide additional clarity, the following is a diagram of the distribution.


Considerations and Key Takeaways

While this is a partially stylistic exercise, it remains significant as it models a series of critical phenomena:

  1. The federated data modeling service can pay for itself if placed under the right sponsorship and effort conditions, but not in the first year.
  2. Investment in both the data catalog and the marketplace is essential (these costs were included in the model).
  3. The primary lever for economic return is the synergy of user access requests, not individual project initiatives (which were assumed to be effective by definition).
  4. The initial year’s investment may seem daunting, but it pays off if pursued with method and rigor.

Success requires not only technological elements (catalog, marketplace) but, most importantly, strong sponsorship, ideally formalized in a process. This sponsorship must see modelers as active participants rather than passive "blessing" figures for initiatives.

For the sake of brevity, I had to condense certain points. Let me know if there’s a need to elaborate or delve deeper, perhaps in a follow-up article.

Thanks for reading Modern Data 101! Subscribe for free to receive new posts and support our work.


MD101 Support ☎️

If you have any queries about the piece, feel free to connect with the author(s). Or feel free to connect with the MD101 team directly at community@moderndata101.com 🧡

Author Connect 🖋️

Find me on LinkedIn 🤝🏻

// Text truncation functionality const elements = document.querySelectorAll('[ms-code-truncate]'); elements.forEach((element) => { const charLimit = parseInt(element.getAttribute('ms-code-truncate')); // Helper function to recursively traverse the DOM and truncate text nodes const traverseNodes = (node, count) => { for (let child of node.childNodes) { if (child.nodeType === Node.TEXT_NODE) { if (count + child.textContent.length > charLimit) { child.textContent = child.textContent.slice(0, charLimit - count) + '...'; return count + child.textContent.length; } count += child.textContent.length; } else if (child.nodeType === Node.ELEMENT_NODE) { count = traverseNodes(child, count); } } return count; } // Create a clone to work on without modifying the original element const clone = element.cloneNode(true); traverseNodes(clone, 0); // Replace the original element with the truncated version element.parentNode.replaceChild(clone, element); }); });