What I Learned Building Quizzy.io
			            
What I Learned Building Quizzy.io
I've been a professional developer for many years—and over the last year I've also been teaching engineers how to integrate AI into real products. Since I work with clients (who understandably don't want their code on my slides), I decided to build my own product where I could demo AI techniques without risking any NDA mishaps.
That project became Quizzy.io.
What is Quizzy.io?
It's an AI-powered quiz app that assembles short quizzes on a chosen topic with a mix of question types and difficulty levels. A fuller description (EN & CZ) is here: https://quizzy.io/aboutquizz.
Under the hood, various OpenAI models generate almost everything:
- 
Topics and short topic descriptions
 - 
The questions themselves
 - 
Correct answers (and distractors for multiple-choice)
 - 
Ordering tasks (e.g., "sort these cities by population: Tokyo, Cairo, Mumbai, New York")
 - 
Images for image-based questions (identify the animal, date the historic photo, etc.)
 - 
Difficulty labels across several tiers
 
It's not a trivial system, but it's excellent for teaching prompt engineering, comparing models, and discussing practical AI implementation patterns.
Lessons Learned
1) Real-time generation ≠ good UX
My original plan was to generate a unique quiz for each user on demand. In practice, producing ~15 questions, ~6 images, and ~3 topics often took 10+ minutes—unacceptable.
Fix: I now pre-generate topics and questions in batches and assemble quizzes from this inventory. Background jobs + caching beat pure real-time here.
2) Letting the model "find images on the web" was unreliable
(At least with 4.1/5 in my tests.) Wikimedia links were often broken or pointed to assets with unclear licensing.
Fix: I abandoned model-driven web image retrieval.
3) AI-generated images are surprisingly good—but not cheap
Quality and relevance were strong once I switched to generation. The downside is cost relative to other content.
Mitigations: reuse assets, generate smaller and upscale if needed, cache aggressively, and be selective about where images add real value.
4) Model choice matters—a lot
Using the same prompts, I saw a big quality jump from "4.1" to "5". The newer models handled structure, nuance, and consistency much better.
5) Abstract SVGs for topic thumbnails were a hit
Each quiz groups questions by topic and shows a topic image. I experimented with abstract, model-described SVGs for these thumbnails—results were fascinating and visually distinctive, with tiny payloads and instant rendering.
6) The model overestimates our knowledge
"Easy" was intended to reflect a 12–15-year-old's level. Many of those questions felt noticeably harder.
Fix: human-in-the-loop calibration plus telemetry (accuracy and time-to-answer) to remap questions to the right difficulty bands.
7) Evaluation pays off
OpenAI's evaluation tools let me retrospectively score outputs. Evaluations surfaced a non-trivial number of inaccuracies—enough to keep evals in the loop for quality control.
8) Use multiple AI copilots—even for code review
I coded with GitHub Copilot (various backends) and periodically reviewed modules with a different model. A second "AI reviewer" caught issues my primary setup missed.
9) Front-end still hurts
For me, the UI layer remains the fussiest and most time-consuming—AI or not. Component ergonomics, responsive tweaks, loading states, and image pipelines all added hidden complexity.
If I Were Starting Today
- 
Batch first, personalize later. Pre-generate a content pool and tailor selection/ordering per user.
 - 
Make difficulty self-correcting. Track correctness and time; auto-rebucket items that skew too easy/hard.
 - 
Guardrails for assets. Centralize licensing/attribution; treat scraped URLs as untrusted.
 - 
Schema over prose. Use structured prompts (JSON schemas) to reduce post-processing.
 - 
Budget the image pipeline. Decide early where images are essential and where text-only is fine.
 
Status & Feedback
Quizzy.io is still a prototype, with more features in the works. I'd love your feedback and ideas—new question types, better difficulty curves, classroom modes, evaluation tricks that worked for you, anything.