Issue #21 – Demystifying the Buzzy Data Platform Terminology
An eleven minute lesson to clarify this ambiguous data term
Read time: 11 minutes
A few months ago, Ben Stencil gave a great talk asking, “What even is a data platform?”
And I’m still wrestling with that question as I write this newsletter issue.
So instead of attributing the data platform to one tangible thing, let’s get philosophical and answer some key questions about the data platform:
What should a data platform do?
Why does it exist to do those things?
What are the tangible foundational elements within it?
What else is there?
Close your eyes and create an image of a Data Platform.
Now open them. What was involved? Was it complex? Was it built on AWS, GCP or Microsoft? Were there many connectors and technologies linking the data sources to databases or warehouses through ETL pipelines and, finally, dashboards and automated reporting at the end?
That is great if there were, but this sub-chapter needs to start with a fundamental truth: a data platform doesn’t need to be complex!
For a small, non-technological company, a data platform might be built in Excel or contain a simple database with basic reporting tools like Power BI or Tableau.
The point is, don’t let peer pressure cause you to shell out hundreds of thousands of dollars on a top-notch, complex data platform without knowing why you are building it or what you will use it for!
1. What Should a Data Platform Do?
If you google the above question, you will get something like this:
“A data platform is a central repository and processing house for all of an organization’s data. A data platform handles the collection, cleansing, transformation, and application of data to generate business insights.”
Before we go into this type of terminology and these functional features, let’s get a bit philosophical.
A data platform is a generic, catch-all term encompassing the many technologies that underpin making data accessible to business users, leading to better decision-making and insights.
It is, therefore, not a thing, but a subjective concept.
Let’s reference René Descartes here: “I think, therefore I am”.
In the Data Platform world, it is more like: “I make sense of data, therefore I am.”
And that, in five words, is what a data platform should do—It should make sense of data.
How does it do this?
Now, we draw from many internet articles that explain the necessity of a data platform. But let’s do it in a way a five-year-old can understand rather than with the technical jargon SaaS and marketing companies tend to use. A data platform:
Ingests Data – It brings data in from internal and external sources
Integrates Data – Helps align/ clean the data into one storage platform
Stores the Data – Structures the data in a storage layer that is easy to use
Processes & Transforms – Enables changes to the data to fulfil business needs
Manages the Data – Ensures the quality of data assets
Secures Data – Guarantees data is safely and securely managed, accessed and used
Serves Curated Data – Provides customised data sets for specified use cases
Provides Insights – Facilitates better decisions through data insights
That’s it. Those eight sentences describe exactly what a Data Platform should do to help a company or user make sense of data.
You can stop reading here if you like (mic dropped), but I do think it’s also essential to understand the benefits this creates and the underlying tools within a Platform.
2. Why Does It Exist to Do Those Things?
Before answering this question, let’s return to our philosophical ‘raison d’être’: A Data Platform exists to make sense of data.
But why should the Data Platform be the one to make sense of data? Can’t Joe in accounting do that with an Excel workbook?
Obviously, with the historical explosion of data and complexity that has ensued, Joe has become overwhelmed by the 1,048,576 row limit in Excel and needs some help. Enter the Data Platform!
The existence of and requirement for a Data Platform is rooted in its ability to address several critical business needs aligned with its overall purpose of making sense of data:
Empower Business Users – This is the name of the game. Data is an enabler for business stakeholders to make better decisions. The Data Platform unlocks this potential by being that unified place for making sense of data
Break Down Silos – Data should be objective and serve as the logical foundation for decision-making. However, as many people have experienced, most organisations have siloed data across departments that say different things. A Data Platform should consolidate these silos and provide a more holistic view of operations. Easier said than done…
Ensure Consistency – Speaking of silos, data consistency is also a huge problem. Organising a company’s data on the platform makes building a ‘single source of truth’ repository easier. Centralised governance, standardised processes, and data management tooling all help ensure that data adheres to the same quality and security standards to create that elusive trust in the data
Support Growth – A scalable data platform aims to accommodate the growth and increased complexity of an organisation’s data, allowing it to continue deriving necessary insights
Compliance Adherence – With the boom of data availability and usage comes increasing regulations around privacy and security. A data platform makes it easier to stay compliant than if Joe was randomly dealing with personal customer data in offline Excel spreadsheets shared via email…
There are probably more benefits and raisons d’être for the Data Platform to exist, but I believe these five are the most crucial. Consider these when evaluating what kind of platform you need, how you build it, and how you use it!
3. What are the Tangible Foundational Elements Within it?
Okay, now for the section everybody was waiting for: What tooling exists in a data platform? What should my data stack be?
You can also refer to my last two articles on building a data technology strategy and mapping the data technology landscape to understand this better
In this article, I will do a brief overview of the six core technology categories that I think should be included in the Data Platform:
To achieve its objectives, a data platform relies on several foundational technology elements:
Ingestion Platform/ Tooling – Tooling that connects to operational technology and other systems to facilitate ingesting varying types of data into the platform. This part of the data platform must be built to consider the kind of ingestion process (batch or real-time) and how sources are integrated to maintain a certain standard of quality/ usability. Consider this the extraction part of the ETL process (although some ingestion tooling includes transformation & processing components)
Data Storage Solutions – After the data is ingested it is stored within the data platform. This infrastructure stores data (either raw or processed), providing a point of accessibility for further transformation or access by end users (after it is processed and curated to their needs). Technology types here include relational databases, cloud storage, data lakes, and data warehouses (although warehouses also fit into the following technology area)
Processing & Transformation – Nowadays, data processing and transformation overlap with storage, making the data platform concept more ambiguous. Whether you operate with a data lake, warehouse, or lakehouse architecture, the processing and transformation stage is necessary. It can happen before data is ingested into the storage layer through ETL pipelines or after, feeding from a lake to tools/ access layers. This part of the process cleans the data, normalises it, enriches it with additional information, and/ or applies business logic to it. Overall, raw data is converted into a structured format that’s ready for analysis
Data Access Layer – This technology component is often an add-on to storage solutions. I’ve described this before as query engines or access management within storage solutions or processing platforms, but it might also include data stores or marts that provide curated data for analytical solutions or operational reporting. Or it can even include reverse ETL, bringing curated data back into the operational data tooling or systems. Here is also where security and access management are factored in, ensuring the right people have access to the right data within the data platform
Analytics and BI Tools – These tools facilitate data analysis, science and visualization, turning curated data into actionable insights. Business stakeholders might think of these tools as the data platform as they are often the front end they interact with. These technologies also allow users to pull data from the storage layer, furthering the confusion. Their role within the data platform is to turn data into insights. Without analysis, the data platform is quite useless and just an expensive repository of numbers and letters
DataOps & Data Management Tools – This is the emerging technology category within the data platform. It includes data observability, catalogues/ lineage, MDM, quality and security tooling. When discussing making sense of data, DataOps and management ensure that the data lifecycle is not one colossal ‘garbage in, garbage out’ exercise. Companies are realising this and interoperability of these tools with storage and processing platforms/ technologies is becoming a standard component of the platform to ensure high-quality data
While I’ve defined these six as the core technology categories within the Data Platform for now, it doesn’t mean this will be the same in five (or even 1-2) years. Holistic approaches to data, like Microsoft Fabric or the whole Databricks experience, might evolve thinking to make data OS tech fit into this definition. I also plan to do another article on these holistic platform approaches, as I think while marketing paints them as a silver bullet, there is a lot of confusion about what the existing tech actually does.
4. What Else Is There?
This question is quite ambiguous. I mean, haven’t we covered all the things we need to know about data platforms (especially now that we know the tooling stack to have)?
Well, no, because I’ve seen many well-built data platforms fail to achieve their goals.
The Right People – Technology is useless without the right people to enable it (we aren’t there with AI yet, trust me). So, the biggest thing to consider when building the Data Platform is whether you have the right people to make it work. Within each of the eight components we listed in the first section, what roles help deliver those elements? For example, having data architects and engineers to build and manage the infrastructure and data pipelines. Or data analysts and scientists to derive insights via the analytical tooling. And remember, don’t just hire 1-2 people thinking they can manage the whole platform; that is a recipe for disaster!
Operating Model & Ways of Working – Speaking of being useless without the right people, you also need the right processes to unlock the full potential of your Platform. The first step is defining those roles and responsibilities for the people you hire to create clarity and enhance their effectiveness. Secondly, define how you work. Enabling cross-functionality is crucial to setting up the platform to enable business stakeholders across different departments. One way to do this is to gather their perspectives on the build strategy (e.g., what tools they would use, how they want to access data, etc.). It’s also worth investing in training and methodologies (e.g., agile) to facilitate this cross-functional work.
Understanding the Business Stakeholder’s Needs – Everybody discusses creating value with your data or platform. This step is how you do it. By defining the business use cases for data, the purpose of the data platform becomes clearer. Supplement this through requirement-gathering sessions focused on the functional and non-functional requirements for use case delivery. Engaging with business stakeholders to gather these needs and expectations will ensure the platform makes sense of data, just as it was built to do!
Data-Enabled Culture – Unfortunately, I’ve seen a lot of Data Platforms get ignored in favour of offline Excel files. Fostering a data culture to evolve past Excel is a difficult thing to do. Don’t just enact a data literacy training course and call it a day. Instead, understand how people want to interact and use data, building the Data Platform around those needs. Then, incentives (e.g., bonuses, awards, recognition) should be created that are aligned with using data more effectively. Finally, ensure leadership champions these things while sharing the small wins teams have accomplished. As Shachar Meir mentions on Joe Reis’s podcast, investing in this type of data culture is really what matters to make a Data Platform work
Adjacent Tooling – There is tooling I didn’t mention in the third section that will also help enable your data platform. Collaboration tools like Jira or Confluence, orchestration tools like Orchestra or Airflow to facilitate interoperability or advanced analytical tools that provide an additional layer of insights. Each of these can be added to the data platform and can exist in a company’s customised stack
So there we have it: a philosophical definition of the buzzy and nebulous “Data Platform” term, turned tangible.
The point I want to leave you on is this:
Overall a data platform is not a tool or technology or even a tech stack. It is an approach that aims to make sense of data. And by doing that it enables further insights and helps deliver the business goals. If your data platform isn’t doing that well, then it isn’t your technology that is the problem, it is your approach to using it.
Next week, we will take the next step into data platform approaches to understand how you should approach building your company’s “Data Platform.” It is worth noting that I fully recognise most companies don’t start with a greenfield platform build, so we will ensure that any data platform approach considers the need to work with what you have!
Thanks for the read! Comment below and share the newsletter/ issue if you think it is relevant! Feel free to also follow me on LinkedIn (very active) or Medium (getting more active). See you amazing folks next week!
I think consistency is a vital piece here and something that worries me about Data Mesh and its enticing goal of decentralized data. To some this means, completely autonomous units managing the data that they care about. In reality there is a lot of data that is common across domains and complete autonomy will take us back to silos and poor interoperability. I think a well designed data platform can support a Data Mesh program and provide that consistency across domains. And good data architecture and data modeling can make this happen!
Solid and concise, awesome work Dylan! By the way, I love the 4th point (especially the one about the right people)