Lessons
Every lesson, in one place.
72 lessons across 7 courses. Search a topic, filter by course, jump straight to the one you want.
72 lessons
Prompt Engineering
12 lessons
- L1
What a prompt actually is
A prompt is a program — written in English (or any language). Treat it that way.
8 min
- L2
Anatomy of a great prompt
The five slots that turn a vague prompt into a reliable one.
12 min
- L3
Constraints: the secret weapon
Telling the model what NOT to do is often more powerful than telling it what to do.
10 min
- L4
Few-shot prompting
Show, don't tell. The fastest way to lock in a format.
10 min
- L5
Chain-of-thought & reasoning
"Think step by step" — and when it actually helps.
11 min
- L6
Structured output (JSON & schemas)
How to make the model produce JSON that actually parses.
10 min
- L7
Personas, roles & tone
Roles do real work — when they're specific. "You are a helpful assistant" does almost nothing.
8 min
- L8
Temperature, top-p, and sampling
The three knobs that control how "random" the output is.
9 min
- L9
Prompt injection & safety
The vulnerability every LLM app has — and how to actually defend against it.
11 min
- L10
Evaluating prompts (the part nobody does)
You're not done when it works once. You're done when it works on a held-out test set.
12 min
- L11
Production patterns: caching, fallback, retries
The infrastructure tricks that turn a prompt demo into a real product.
11 min
- L12
Capstone: ship a prompt system
Pull it all together. Design a prompt + eval set + production wrapper for a real task.
18 min
LLMs & Transformers
10 lessons
- L1
Tokens — what models actually see
Models do not read characters or words. They read tokens. This one reframe explains a lot of weird behavior.
9 min
- L2
Embeddings — words as coordinates
Once a token is an integer, it becomes a vector in a high-dimensional space. The geometry of that space is where meaning lives.
10 min
- L3
Attention — the trick that made LLMs work
For every token at every layer, the model looks back at every other token and decides what to focus on. That's attention.
12 min
- L4
Inside a transformer block
Attention is one piece. The transformer block stacks it with norms, residuals, and a feed-forward layer.
10 min
- L5
Positional encoding — why order matters
Self-attention is order-blind. We have to inject "where am I in the sequence" by hand.
8 min
- L6
Sampling — how the next token gets picked
The model outputs a distribution. Picking from it is a separate (and tunable) step.
10 min
- L7
Reading a model card
Pick the right model — and stop guessing — by reading the card like an engineer.
10 min
- L8
Context windows, KV cache & long context
Why a 1M-token context is impressive — and expensive — and slower than you think.
11 min
- L9
Reasoning models & test-time compute
Why "thinking out loud" before answering makes the model smarter — and when it doesn't.
10 min
- L10
Capstone: pick the right model for the job
Combine everything: parameters, context, latency, cost, license, and your eval. Decide.
12 min
RAG & Vector Databases
10 lessons
- L1
What RAG actually is — and when not to use it
RAG is a retrieval system that feeds an LLM. That's it. The hard parts are everything except the LLM.
9 min
- L2
Cosine similarity in 5 lines of code
Retrieval is just "find the closest vectors". The math is one dot product and two norms.
9 min
- L3
Chunking — the most important boring decision
Bad chunking is the #1 cause of bad RAG. There's no universally right strategy — but there are clear wrong ones.
11 min
- L4
Vector databases — what they actually do
A vector DB is a specialized index for "find the k nearest vectors" at scale. Pick one once you actually need scale.
10 min
- L5
Build a tiny end-to-end RAG
Put it together: chunks, vectors, retrieval, prompt assembly. All in 50 lines of JavaScript.
13 min
- L6
Hybrid search & rerankers
Pure vector search misses keyword-precise queries. Pure keyword search misses paraphrases. Use both.
10 min
- L7
The five most common RAG failure modes
Diagnosing a broken RAG is half the job. Here's the field guide.
10 min
- L8
Evaluating a RAG pipeline
Measure retrieval and generation separately. Aggregate metrics hide everything.
12 min
- L9
RAG in production: cost, latency, freshness
A working RAG demo is 10% of the work. The rest is keeping it healthy.
10 min
- L10
Capstone — design RAG for support tickets
A realistic system design exercise. Pick chunking, retrieval, eval, and ops choices.
15 min
Fine-tuning & Adaptation
10 lessons
- L1
Should you fine-tune?
Fine-tuning is rarely the answer. This lesson is a decision tree for when it actually is.
10 min
- L2
LoRA, QLoRA, and PEFT
You don't fine-tune the whole model. You train a tiny adapter and freeze everything else.
11 min
- L3
Building a training dataset
Bad data destroys good models. Good data is half the work — and most of where you should spend time.
12 min
- L4
A QLoRA training run, end-to-end
The practical recipe. From dataset → fine-tuned adapter → merged inference.
13 min
- L5
Hyperparameters that actually matter
Most hyperparameters don't matter much. A few do — a lot.
9 min
- L6
Evaluating a fine-tuned model
Train loss going down means *something* is happening. Whether it's the right thing is a separate question.
11 min
- L7
RLHF, DPO, and "alignment" — briefly
Why "instruct" models exist, and why you probably shouldn't do RLHF yourself.
10 min
- L8
Catastrophic forgetting
The fine-tune learns the new task — and forgets things it used to do well. Here's how to avoid it.
9 min
- L9
Hosting your fine-tune
Once you have an adapter, where does it run? Three honest options.
9 min
- L10
Capstone: fine-tune for a specific task
Pull it together. Make the call: prompt? RAG? Fine-tune? Then design the run.
14 min
Deployment & MLOps
10 lessons
- L1
From notebook to production — the gap
A working notebook is 10% of the work. The other 90% is what nobody photographs.
9 min
- L2
Inference servers — vLLM, TGI, Triton, SGLang
Don't serve LLMs from raw Hugging Face Transformers. The good engines exist for a reason.
12 min
- L3
Cloud GPUs — picking the right machine
GPU choice is 50% of cost. Pick wrong and you waste money. Pick righter and you save serious cash.
10 min
- L4
Containers & immutable deployments
Reproducible builds. Same image runs locally, in CI, in prod. No "works on my machine".
10 min
- L5
Autoscaling & traffic patterns
Bursty traffic + slow GPU cold-starts = the canonical MLOps headache.
10 min
- L6
Cost optimization that actually moves the needle
90% of LLM cost wins come from 5 patterns. Skip the obscure ones until you've done all five.
11 min
- L7
Monitoring — what to actually watch
Prometheus dashboards lie. The right four metrics catch 90% of incidents.
10 min
- L8
Shadow traffic, canaries, and A/B tests
Three rollout patterns, each appropriate for a different kind of risk.
11 min
- L9
CI/CD for ML pipelines
Pipelines that ship models like code: tested, versioned, reviewable, rollback-able.
9 min
- L10
Capstone: design a production stack
Make every choice. Stack, GPU, rollout, monitoring, cost.
14 min
AI Engineering Foundations
10 lessons
- L1
Python for ML in 30 minutes
You don't need 10 years of Python. You need NumPy, lists, dicts, and iterators. Here's the survival kit.
10 min
- L2
Vectors and dot products — the intuition
Three things to internalize: vectors are arrows, dot products measure alignment, distances measure dissimilarity.
10 min
- L3
Matrices and matrix multiplication
A neural network is, mostly, a sequence of matrix multiplications.
10 min
- L4
Probability for ML, briefly
You don't need to be a probabilist. You need: distributions, expectation, log-probs, entropy.
9 min
- L5
Gradient descent — how models actually learn
Pick a loss. Compute its gradient. Step downhill. Repeat. That's every neural network ever trained.
11 min
- L6
Train / val / test — how to not fool yourself
Models that memorize their training data look great on it. The whole game is honest evaluation.
9 min
- L7
Loss functions — picking the right one
Different problems need different losses. Three you'll meet 90% of the time.
9 min
- L8
Overfitting and regularization
The model that fits the training data perfectly is rarely the best model. Six tools to keep it honest.
9 min
- L9
Build a tiny neural net from scratch
Forward pass, loss, gradient, weight update. A real (tiny) classifier in 60 lines of plain JavaScript.
14 min
- L10
Capstone — diagnose a training run
You're handed a broken run. What's wrong, and what do you check first?
12 min
AI Agents
10 lessons
- L1
What an AI agent actually is — and what isn't
Most "AI agents" in production are 2-step pipelines. Real agents loop, decide, and act. Knowing the difference saves you weeks.
9 min
- L2
ReAct — Reason + Act + Observe
The pattern under almost every modern agent. Surprisingly simple, surprisingly effective.
11 min
- L3
Tool use done right
Tools are the agent's hands. Bad tool design wrecks more agents than bad models.
11 min
- L4
Planning — single-step vs multi-step
ReAct is reactive. For long tasks you also want a plan. The hybrid wins.
10 min
- L5
MCP and the rise of agent protocols
MCP is to agents what HTTP is to web apps. Worth understanding even if you don't use it directly.
10 min
- L6
Memory — short-term, long-term, none
Most "memory" features in agents are over-engineered. Three simple patterns cover 90% of needs.
9 min
- L7
Multi-agent — when worth it
Most multi-agent demos are a single agent with extra latency. Sometimes it's genuinely the right tool.
9 min
- L8
Agent safety and guardrails
Agents are LLMs with the ability to act. The blast radius is bigger. Defense is layered.
11 min
- L9
Evaluating agents
Agents don't have a single right answer. Eval is about success rates and trace quality.
11 min
- L10
Capstone — design a research agent
Pull it together. Build a research agent that scopes, researches, and writes.
14 min