Issue #38 – Underpinning Foundations for ML & AI
What you need to know about ML & AI before you go and invest a ton of money on building it
Read time: 8 minutes
Serious question, is AI magic?
The obvious answer is no…
…but the types of things we’ve done with it (not to mention electricity, computers, the internet) are magic-esque.
While the ‘art of the possible’ mindset helps drive innovation and invention, the AI = magic perception has obvious downsides.
Like electricity, computers and the internet, ML & AI is grounded in math and science. Whatever we build today and tomorrow has backend foundations that come together to create these magical outputs we see in front of us.
And within those foundations, we must understand what is and isn’t Machine Learning or AI. Or how/ when we approach either topic. Because (unlike what you hear on LinkedIn or from Executives), AI is not the answer for everything…
So this week in The Data Ecosystem, we dig into what underpins Machine Learning and AI, hopefully making the subject (and how to approach it) a bit clearer in the complex world of meaningless data terms.
Machine Learning’s Academic Genealogy
In the last article, we defined both ML and AI.
Simply put, Machine Learning is the process of computers using algorithms to make better decisions or predictions by learning from data. Artificial Intelligence is an outcome of ML where computers can think and mimic human intelligence through algorithms.
Despite those definitions, machine learning and AI still seem like you’re pulling something out of thin air for most people.
Realistically, both these domains are founded in statistics and math. The average person doesn’t need to understand algebraic methods or calculus to use machine learning/AI, but understanding how these academic disciplines influence it is fundamental if you want to implement it within your organisation.
So, at the highest of levels, here is a deep dive into how statistics and math underpin ML/ AI:
Statistics and Probability – The most essential element of all things data analytics and science is stats. This is about understanding patterns in the data, what they mean, and how to derive insights/ implications from them. Most people have a standard grasp of statistics (like averages, median, mode, etc.), which is foundational for data literacy and analytics reporting. When talking about ML/ AI we can go one layer deeper. Stats are integral for:
Predictions & Probability – ML models are based on determining the likelihood of how things will happen in the future or creating prescriptive outcomes that are most optimal for a pre-defined variable. Both of these things are grounded in the domain of probability and optimisation (both elements of statistics)
Model Training – Sampling and validating data requires statistical techniques to ensure training data represents real-world situations
Model Performance – Testing accuracy (or hypothesis testing) is about using statistics to measure the model’s efficacy. Consider confusion matrices, R-squared, Pearson’s correlation, cross-validation, etc.
Noise – Identifying and handling outliers in the data is done through statistical measures, helping control the algorithmic output and determine meaningful patterns
Mathematics – While stats provide 60% of your needs, further mathematics disciplines (like calculus or algebra) unlock algorithms and methods to identify additional data patterns and relationships. This gives us the tools to work with data in more complex ways, allowing for things like regression, classification, and decision trees. I’ve identified four different ways math plays into ML/ AI:
Linear Algebra – How data is represented (usually in matrices, vectors, or tensors) so that computers can process it efficiently. To do large calculations, linear algebra helps organise the data into structured formats that algorithms can understand
Calculus – Essential for optimisation (finding a function’s minimum or maximum value). Calculus fits into algorithmic development, helping create functions to best understand how changes in inputs affect outputs
Optimisation Theory – A subset of calculus that focuses on making optimal decisions while considering constraints. Applying this theory helps create algorithms to find the best solutions among millions of possibilities, which is essential for training ML models effectively
Mathematical Logic – This is not mentioned as much, but mathematical logic is a domain that focuses on defining real-life scenarios within mathematical form. This basis provides a basis for computers to translate business rules into computational steps, providing the foundations for reasoning. You won’t ever directly experience it, but this study of logic underpins computer science and dates back to the study of logic by philosophers
These academic domains differ from the four Core Components mentioned in the previous article (data, algorithms, computing power/ infrastructure, and human expertise). Nonetheless, all four core components are based on statistics and math in some form but are more digestible for the average person.

With the academic foundations out of the way, let’s look at a simplified process to do ML/ AI (and the backend technologies/ tools that support this process)
Analytics vs. Machine Learning: What's the Difference?
Before we get into the specific types and approaches to ML and AI (next two articles), we need to draw the line between analytics and machine learning. How are they different?
Analytics essentially means doing analysis on data. Doing analytics is about exploring datasets, identifying patterns and turning these findings into insights that can improve decision making.
Essentially, analytics is about drawing out implications from data.
As, I’ve written about before, there are four levels of analytics: Descriptive, Diagnostic, Predictive and Prescriptive (some people also add a fifth one which is Cognitive, but its basis as a category for unstructured data doesn’t fit in the progressive evolution of the other four)
The domain of analytics, therefore, covers a lot, basically every type of decision you could want to make. So how does it differ from Machine Learning & AI?
Well, ML & AI are more technologically evolved methods of doing analytics. Instead of relying on humans, ML/ AI automates the decision-making and relies on algorithms to find the patterns and determine insights from the data.
Now, the use of ML & AI can span all four levels of analytics, not just predictive or prescriptive (an excellent point made by Jon Cooke in one of my posts). For example, clustering algorithms can help create patterns via historical data in a dashboard (descriptive analytics), or a simple regression formula can draw out the correlation as to why something happened (diagnostic analytics).
That said, human cognitive ability still plays a massive part in ML/ AI, as people need to train the data, ensure the logic/ rules are appropriately set, and confirm the outputs are directionally correct.
Therefore, the best approach is to combine both:
(1) use analytics to understand your problem and data,
(2) then use ML to automate and scale solutions.
When to do ML vs. when to do AI?
Let’s recall the differences between ML and AI from above and in my past article. To jog your memory, Machine Learning focuses on machines learning how to improve decision making over time with more data and results, whereas AI is the ability of computers to mimic human intelligence.
But what does that mean in practice?
If you are a business or data leader and want to implement ML or AI, here are five categories to think about when considering implementation:
Scope & Focus: What do you need to do? What business goals/ strategy is this helping support? How broad is your request?
ML typically solves specific, well-defined problems like predicting churn or helping segment different customer groups
AI provides a tool to tackle broader challenges like customer service questions via a chat bot or searching documents to draw out insights from employee queries
Use Cases: What is the business need you are solving for? How does existing tools/ products help solve those problems? What foundations are you building off of?
The focus of ML use cases are usually targeted analytical tasks like fraud detection, product recommendations, or price optimisation where patterns can be learned from historical data. You could also consider how existing analytical tools/ data products might evolve to deliver these use cases
Use cases for AI are more complex and multi-faceted. It will likely involve multiple use cases that can be solved by an overall tool/ approa involving scenarios like natural language understanding, document comprehension, or multi-step problem solving that requires human-like reasoning
Data Requirements: What data do you have access to? How much preparation is needed? What are the quality requirements?
Most ML methods and use cases need to be built on large amounts of high-quality, structured data specific to the problem. Less structured data can be used for certain use cases, but the algorithm and approach must take this into consideration
AI tools can work with diverse data types and sources, often requiring less structured data but needing broader context. Nonetheless, higher quality and prepared data helps enhance outputs with specific domain knowledge
Development Timeline: How long will implementation take? What are the key milestones? When can you expect to see results?
ML projects often follow a more predictable timeline: data preparation, model development, testing, and deployment. A typical project might take 3-6 months for initial deployment
AI initiatives tend to have longer, more variable timelines due to their complexity. This will require iterative development, extensive testing, and continuous refinement. Initial deployment might take 6-12 months with ongoing improvements
Required Expertise: What skills do you need on your team? How specialized should they be? What level of business understanding is required?
ML needs data scientists and engineers with specific technical skills Python. A background in statistics and model development is also crucial. This goes beyond most data analysts’ skillsets, but analysts can often build ML models as well (though maybe not bring them into production
The skillset to build AI requires a lot more foundational capabilities like ML/ AI engineers, systems architects, domain experts, and AI scientists/ researchers. This doesn’t include project managers or other data roles to ensure the output works across teams and within the existing data foundation
As you can see, there is more than meets the eye when you are trying to determine whether you should do ML or AI!
In the next article, we will start diving into the types of Machine Learning models that exist (with AI tools/ solutions the article after). This is where the rubber hits the road and where you can actually start thinking about what can be built and how to build it! Until then, have a great week!
Thanks for the read! Comment below and share the newsletter/ issue if you think it is relevant! Feel free to also follow me on LinkedIn (very active) or Medium (increasingly active). And if you are interested in consulting, please do reach out. See you amazing folks next week!
Thank you, Dylan. Your usual "signal from the noise" writing quality!