Synthetic Data & Data Flywheels
Generate the training data your model needs — instead of waiting for it.
Real labeled data is slow, expensive, and skewed toward common cases. This course teaches the techniques that let you build high-quality training sets at scale: self-instruct, quality filtering (rule-based + LLM-as-judge), targeted augmentation for rare classes, privacy-preserving generation, preference data for DPO fine-tuning, and the production data flywheel that turns user interactions into continuous improvement.
6h
Duration
8
Lessons
0
Learners
Course map
Lessons unlock as you complete the previous one. Your progress is saved on this device.
Lesson 1
Why synthetic data — and when it backfires
9m35 XPLesson 2
Self-instruct and instruction-tuning data generation
11m38 XPLesson 3
Quality filtering: removing bad examples before they corrupt training
10m35 XPLesson 4
Augmenting rare classes and edge cases
10m35 XPLesson 5
Privacy-preserving synthetic data
9m33 XPLesson 6
Preference data and RLHF datasets
11m38 XPLesson 7
The data flywheel: production logs → training data
11m40 XPLesson 8
Capstone: build a legal document analysis training set
17m55 XP
Take next
Courses that pair well after — or alongside — Synthetic Data & Data Flywheels.