Artificial intelligence is the goal: build systems that do things humans associate with intelligence. Machine learning is the dominant modern approach: instead of programming the rules, you let the machine infer them from data. Deep learning is one family of machine-learning models that use neural networks with many layers — the family that has driven nearly every breakthrough since 2012.
The three flavours of machine learning
Supervised learning: you have labelled examples (loan applications labelled 'defaulted' or 'paid'), and the model learns to predict the label. This is most analyst work — credit scoring, churn prediction, fraud detection.
Unsupervised learning: no labels. The model finds structure in the data — clusters of similar customers, anomalies in transaction patterns. Useful for exploration.
Reinforcement learning: an agent takes actions in an environment, receives rewards, and learns a policy. The technology behind AlphaGo, behind robot control, and (post-pre-training) behind making LLMs helpful via RLHF.
Why now? The three forces that compounded
- Compute: GPUs originally designed for video games turned out to be ideal for the matrix multiplications neural networks need. NVIDIA's market cap reflects that.
- Data: the internet generated trillions of tokens of human text. The pre-training datasets for modern LLMs include large fractions of the publicly accessible web.
- Architecture: the 2017 transformer paper introduced a way to handle sequences of any length in parallel, replacing recurrent networks. Every modern LLM (GPT, Claude, Gemini, Llama) is a transformer.
What an LLM actually does
A large language model is, mechanistically, a function: take a sequence of tokens, produce a probability distribution over the next token. Sampling from that distribution iteratively produces text. That's it. Everything impressive that LLMs do — answering questions, writing code, summarising documents — emerges from being very, very good at this one task at scale.
The honest definition
A modern LLM is a next-token predictor trained on roughly the entire public internet, with reinforcement-learning fine-tuning to make its outputs helpful, harmless, and honest. That's the whole machine — the rest is engineering around it.
What it isn't
It isn't a database of facts. It isn't a reasoning engine in the formal-logic sense. It doesn't 'understand' in the human sense, despite often producing outputs that read as if it does. These distinctions matter when you deploy it for analyst work — they tell you where to expect it to fail.
Exercise
For each of the following analyst problems, identify which flavour of machine learning is most appropriate (supervised, unsupervised, or reinforcement), and briefly justify the choice: (1) Predicting whether a SACCO loan applicant will default within 12 months. (2) Grouping 200,000 M-Pesa users into customer segments to target with different products. (3) Training an algorithmic trading agent that learns to execute large orders with minimal market impact. (4) Using a labelled set of 5,000 Kenyan court rulings to build a classifier that flags new cases as likely-appealable.