Loving, But Also Hating, But Also Loving the Medallion Architecture
13 March 2024

Why Love Medallion?
One of the fundamental problems I have with the data world is the lack of purpose or outcome associated with 90% of the input. You can head to many businesses, all across the world, and find data professionals busily being busy, but often achieving very little. Helping lead to the admission that many businesses actually don’t see any value delivered from their data operations.
It’s a frustrating reality, but an all too real one. There are many reasons why. Some data professionals simply don’t think commercially and spend a lot of time in the details without seeing the bigger picture. Sometimes, it’s because a good deal of the data aren’t actually available for one reason or another. However, in some cases, it’s simply because there’s too much existing complexity across an organisation’s data estate.
This is where the medallion architecture shines. Rather than have a database here, a pile of json stored there, a data warehouse for one team over here and a sharepoint there - you can centralise all of your data world into one estate and manage it from there. The simplicity of it is the beauty of it. Got data from a central ERP system? Great. We’ll export that and get it into our bronze area. More data from an HR system? Excellent, new folder, same place, same governance. A CRM? An external system? Web sources?… the list goes on. It creates an incredible level of simplicity for visualising the what and the where in an organisational data estate.
So… Why Hate Medallion?
Uuuuurrrgghhh. It’s just not as simple as that.
Well, it sort of is - but it’s a question of governance and decision-making. There’s no perfect solution, if there was we’d be there already. Medallion takes huge strides in creating an overarching principle, but falls down in some areas.
Firstly, there’s the naming convention. Bronze, Silver and Gold. Fetching isn’t it? Well, it’s quite a convenient little piece of labelling and the cynic in me thinks that it’d be a doddle to sell to C-suite. Nobody fails to understand the principle of bronze, silver and gold, but without context, it doesn’t actually mean anything. If you have a new business user or data professional join who doesn’t have experience with this architecture, then it’s a case of translating back to what it actually is: raw, clean and curated. So why not just use those terms? They’re clear to business people too, so it’s not like it has any impact on anything - other than naming the ‘sections’ more appropriately. That brings me on to the second bug bear.
Why in the world did somebody think three was the correct number of ‘sections’? I wholly appreciate the reality of selling it - but three is just too constrained. At the very least, some context and imaginative governance is required. If we think about potential, abstract ‘stages’ of a data transformation/transportation process, it becomes quite clear:
- Collect raw data and persist it somewhere
- Clean the data
- Model the data
- Curate and aggregate the data
So, if we think about bronze, silver and gold; I know my bronze area is unspoiled raw data - and I know I want curated, pre-aggregated datasets in the gold, curated layer. But what then? We have to clean and model the data into one place. Do we put the modelled data into gold? Many resources online now say so, but the silver area being simply a storage area for cleaned data feels wasteful. Similarly, we could model in the silver area, but to many this is unintuitive, that end users would build reports from silver… it doesn’t feel right. Databricks own documentation, accessed 2024-03-20, says that while silver should be ‘matched, merged, conformed and cleansed (“just-enough”)’, it should also have more third normal form-like data models. Clear as mud.
From Databricks:

Databricks documentation suggests that silver be ‘just enough’, yet advocates the existence of data models
All of this feeds on top of the fact that, actually, Medallion is nothing new at all. We’ve been doing ELT for years with staging. It’s really just branding on top of what a well-considered data estate should always have looked like. All we’re trying to do is dress mutton as lamb and pretend that it’s somehow different. So why exactly do I actually still love it?
So… Why Also Love Medallion?
Well, because in spite of all its foibles and peccadillos, it actually has simplified everybody’s mind. In the short period of time since medallion-based lakehouses have become de jour, I’ve seen, time and time again, data operations manage to really effectively close the gap in understanding between them and the business. It’s really easy and straight-forward for a business-side individual to understand the process of steps, stages, milestones and gates - and those things are in the same language as a multi-hop style architecture - which Medallion very much is.
“We’ve now managed to pull the data into the bronze area and we’re working on getting it through to silver.” is a statement that could absolutely be understood by a business-side colleague. Definitely much more than any other statement conveying similar information that predates this approach.
So where does that leave us with the issues above? Well. Quite simply the onus is still on the data professionals to arrive at the right level of governance. Perhaps you decide, as a data function, to clean and model in the silver layer - leaving gold for only pre-aggregated data? That’s fine, as long as it’s clearly defined. Perhaps you decide to have a fourth “Platinum” layer for heavily aggregated data products and leave silver for cleaned source data and gold for data models. At the end of the day, only the results matter - contrary to what 95% of the preachy messages on LinkedIn say. As long as the data are serving the business in the most efficient time possible, the rest will take care of itself.
As usual, regardless of what’s happening, the onus falls again on great governance, great quality and great leadership to ensure everybody is bought in. If you’ve got that, you’ll rarely go far wrong.
