The Modern Data Platform Design: a tool agnostic approach

René-Jean Corneille
The Startup
Published in
7 min readAug 7, 2022

--

The big data echosystem is still too big! There I said it. I still remember seeing this post years ago and as a young and upcoming data scientist and it resonated with me. When I tried to keep up to date with all the advancements in data infrastructure I felt overwhelmed by the breadth of tools, resource to learn about. As I gained experience in the domain, I soon realised that when designing then building or maintaining data platforms, tools matter less than use cases. Use cases are more stable than tools. How do you leverage use cases so that they become generic enough to be useful for design: by developing abstraction layers.

I have been thinking about this for a while, at first my layers were too specific which meant that they were not always reusable. I refined them over time with the learning stemming from building data platforms.

I ultimately identified the following:

  • ingestion
  • storage
  • transform
  • application
  • execution
  • governance
Data platform layers

Ingestion

The first step towards using data effectively to help a business make decisions is gathering all the data from first, second and third parties into one place. The intelligence in…

--

--

René-Jean Corneille
The Startup

Director of ML. I write about data science, mlops, python and sometimes C++