What I Learned Building Quizzy.io

27.10.2025

What I Learned Building Quizzy.io

I've been a professional developer for many years—and over the last year I've also been teaching engineers how to integrate AI into real products. Since I work with clients (who understandably don't want their code on my slides), I decided to build my own product where I could demo AI techniques without risking any NDA mishaps.

That project became Quizzy.io.

What is Quizzy.io?

It's an AI-powered quiz app that assembles short quizzes on a chosen topic with a mix of question types and difficulty levels. A fuller description (EN & CZ) is here: https://quizzy.io/aboutquizz.

Under the hood, various OpenAI models generate almost everything:

Topics and short topic descriptions
The questions themselves
Correct answers (and distractors for multiple-choice)
Ordering tasks (e.g., "sort these cities by population: Tokyo, Cairo, Mumbai, New York")
Images for image-based questions (identify the animal, date the historic photo, etc.)
Difficulty labels across several tiers

It's not a trivial system, but it's excellent for teaching prompt engineering, comparing models, and discussing practical AI implementation patterns.

Lessons Learned

1) Real-time generation ≠ good UX

My original plan was to generate a unique quiz for each user on demand. In practice, producing ~15 questions, ~6 images, and ~3 topics often took 10+ minutes—unacceptable.
Fix: I now pre-generate topics and questions in batches and assemble quizzes from this inventory. Background jobs + caching beat pure real-time here.

2) Letting the model "find images on the web" was unreliable

(At least with 4.1/5 in my tests.) Wikimedia links were often broken or pointed to assets with unclear licensing.
Fix: I abandoned model-driven web image retrieval.

3) AI-generated images are surprisingly good—but not cheap

Quality and relevance were strong once I switched to generation. The downside is cost relative to other content.
Mitigations: reuse assets, generate smaller and upscale if needed, cache aggressively, and be selective about where images add real value.

4) Model choice matters—a lot

Using the same prompts, I saw a big quality jump from "4.1" to "5". The newer models handled structure, nuance, and consistency much better.

5) Abstract SVGs for topic thumbnails were a hit

Each quiz groups questions by topic and shows a topic image. I experimented with abstract, model-described SVGs for these thumbnails—results were fascinating and visually distinctive, with tiny payloads and instant rendering.

6) The model overestimates our knowledge

"Easy" was intended to reflect a 12–15-year-old's level. Many of those questions felt noticeably harder.
Fix: human-in-the-loop calibration plus telemetry (accuracy and time-to-answer) to remap questions to the right difficulty bands.

7) Evaluation pays off

OpenAI's evaluation tools let me retrospectively score outputs. Evaluations surfaced a non-trivial number of inaccuracies—enough to keep evals in the loop for quality control.

8) Use multiple AI copilots—even for code review

I coded with GitHub Copilot (various backends) and periodically reviewed modules with a different model. A second "AI reviewer" caught issues my primary setup missed.

9) Front-end still hurts

For me, the UI layer remains the fussiest and most time-consuming—AI or not. Component ergonomics, responsive tweaks, loading states, and image pipelines all added hidden complexity.

If I Were Starting Today

Batch first, personalize later. Pre-generate a content pool and tailor selection/ordering per user.
Make difficulty self-correcting. Track correctness and time; auto-rebucket items that skew too easy/hard.
Guardrails for assets. Centralize licensing/attribution; treat scraped URLs as untrusted.
Schema over prose. Use structured prompts (JSON schemas) to reduce post-processing.
Budget the image pipeline. Decide early where images are essential and where text-only is fine.

Status & Feedback

Quizzy.io is still a prototype, with more features in the works. I'd love your feedback and ideas—new question types, better difficulty curves, classroom modes, evaluation tricks that worked for you, anything.