New Release
Learn More
Your submission has been received!
Thank you for submitting!
Thank you for submitting!
Download your PDF
Oops! Something went wrong while submitting the form.
Table of Contents
Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!
the following is a revised edition.
Just as Blake saw the universe in a grain of sand, great scientists have often looked for clues to vast and complex problems in the small, familiar phenomena of daily life. Newton drew insights about the heavens by juggling wooden balls. Einstein contemplated a boat paddling upriver. Von Neumann pondered the game of poker. ~ Sylvia Nasar
Just like that, the age-old observation that simplicity just works better applies up and down from our plane of observation, which is usually the daily life. Does it also work better for our work, aka the mammoth data industry and its many tracks and offerings?
Way back in the 1960s, Kolmogorov changed the way we understood ‘simplicity’ and ‘complexity’. They defined in concrete equations how exactly simplicity could be consciously created even for apparently complex solutions or outcomes unless the defined outcome is random.
Unabashedly, we’ll start with a geeky example.
Take a string S (a sequence of symbols), and then consider the length (in symbols) of the shortest computer program capable of generating S. For example, the loop “for i=1 to 1000: print '1'” prints a string of 1000 characters, but the program itself contains only 26; this string is highly compressible.
By contrast a “typical” random string of 1000 characters (e.g. “3938382…”) can’t be compressed in this way, although it can be expressed by a program that is itself about 1000 characters long (e.g. “Print '9486390348473969683…'”, which has 1008 characters).
More generally, a string that contains regularities or patterns—of any form that can be expressed in the computer language—can be faithfully reproduced by a short program that takes advantages of these patterns, while a relatively complex or “random” string cannot be similarly compressed (or de-complexified). ~ Extract from “The simplicity principle in perception and cognition”, published in NIH
In essence, the principle of simplicity calls for repeatable patterns.
What is another name for a repeatable pattern? Productisation.
What is productisation also? An approach/tool that helps to relatively delegate, dissipate, or redirect good amount of effort.
The loop “for i=1 to 1000: print '1'
” is a product that helps the user to relatively delegate, dissipate, or redirect good amount of effort → the pattern is the product that the user codes in just 26 characters and also has the flexibility to tweak it only once to produce much larger and, therefore, “apparently” more complex strings or outcomes
A very interesting point to note so we can join the dots later → A more random string, on the other hand, cannot be condensed down to a set pattern, therefore carrying the same, in fact, higher complexity over to the production side (1000+8 characters).
Also note how the solution for the random string is ”fixed”, while the one for the product is embedded with variables that move along with the user’s requirements (fed into the “product” through xyz interface).
Let’s go back to our Projects vs. Products analysis.
The Product Approach brings down seemingly complex solutions and data stacks into:
A Product is anything that makes life a little easier for the user. And is naturally dictated by the user’s needs. In a single sentence, a product is for the user and by the user.
In the context of the above sections, a product is also something that identifies patterns and enables repeatability to make solutions simpler. Minimising randomness and fixedness to make the solutions more effectively aligned to users.
But how to build for the user and involve them in the production process?
In our case, the user is any business persona simply looking to make better decisions with data. For example, a sales representative wanting to quickly decide on a Monday morning which three customers he wants to target for upsells during the week.
Involving the user to productise data for high-quality, governed, and purpose-driven insights →
This is the model-first approach to building data products, which amply takes in the user’s feedback through logical data modeling or prototyping data products, allowing users to even experience the value of the prototypes before the actual build effort kickstarts. Like an architect finalises the blueprint before laying eyes on a brick.
This approach also makes plenty of space to leverage repeatability through the prototype:
In short, the Product Mantra for Data rolls out like this:
When it becomes simple for users to integrate themselves into the production process, it also becomes simple for data developers to create a product that consistently aligns with the business. This roll-out is orchestrated through a self-serve platform on top of your existing infrastructure.
The Data Mesh Paradigm defines a self-serve platform along the following parameters:
…to make analytical data product development accessible to generalist developers, to the existing profile of developers that domains have, the self-serve platform needs to provide a new category of tools and interfaces in addition to simplifying provisioning.
A self-serve data platform must create tooling that supports a domain data product developer’s workflow of creating, maintaining and running data products with less specialized knowledge that existing technologies assume; self-serve infrastructure must include capabilities to lower the current cost and specialization needed to build data products.
The role of a Self-Serve Platform in rolling out simplicity in data across different planes of usage in the data stack:
We don’t want to do a typical build vs. buy analysis here as the strategy of building vs. buying data solutions is completely dependent on organisational choices and core competencies. Instead, we’ll share how you could either build or buy based on your strategic pick.
Building a self-serve platform requires a dedicated platform engineering team with, give or take, at least a couple of years of development effort. But that could be the right choice for organizations that have high core competencies in platform engineering or have established something similar that needs enhancement.
The data mesh paradigm neatly lists the capabilities of a self-serve platform. To develop the same, you could refer to datadeveloperplatform.org, a community-driven initiative to standardise the capabilities, blueprint, and overall platform fundamentals of self-serve platforms. You can learn more about it here.
For buying a self-serve platform solution, it could be tricky to navigate the distinction between a repackaged stack of tools and the true self service capability. We’ll refrain from listing down vendor options to keep this piece truly unbiased, but feel free to email us community@tmdc.io to discuss potential options. We’ll be happy to share more on what we are building as well as other great options that fit the frame.
Still around? Look out for a special edition release this Friday featuring some of your favourite data leaders and practitioners! Follow MD101 on LinkedIn or Twitter to be the first ones here, or subscribe for free 🔔
1. Building your Sausage Machine for Data Products 🌭: Less Tech, More Strategy
2. 🎯 End-to-End Guide to Building Data Products