We're building a Bring-Your-Own-AI flow into Surprise Chef: instead of making you wire up an MCP connector (fiddly, not everyone will do it), you copy a prompt, paste it into whichever AI you already use, paste the JSON answer back, and a structured smart recipe lands in your kitchen. The whole thing takes about a minute.
But the claim — "works with any AI" — only holds if it actually does. So we set up a head-to-head. Eight AIs. One cookbook photo. One prompt. May the best model win.
The cookbook: San Choy Bau. Serves 4. 20 minutes prep, 8 minutes cook. Pork mince, lettuce cups, oyster sauce, the usual good stuff. A printed page we photographed with a phone, no retouching, slightly off-angle.
What happened next involved a hallucinated chicken, a cookbook bug that nobody in the cookbook's editorial chain had caught, and one surprisingly good free-tier chatbot that most people don't even think of as an option.
The contestants
- Microsoft Copilot — free, via the Windows sidebar
- Free ChatGPT — no login, private tab, base model
- ChatGPT Plus + Thinking mode — paid, reasoning on
- Gemini 2.5 Pro — free tier of the Gemini app
- Google AI Mode Pro — free, via Google Search
- Claude Sonnet 4.6 — paid, standard
- Claude Opus + Extended Thinking — paid, reasoning on
- Grok in Expert mode — free on grok.com
- Meta AI — free at meta.ai (Meta account required; the same assistant is also baked into WhatsApp, Instagram, and Facebook)
Each one got the same prompt — a four-step verify-loop that asks the AI to transcribe what it sees from the photo, pause for confirmation, sanity-check the recipe for issues, then generate structured smart recipe JSON. Then we watched.
Round 1: Did you actually read the photo?
“1 lb (500 g) chicken mince. 1 tbsp dry sherry.”
— Microsoft Copilot, confidently inventing a recipe
The first thing we noticed was the hallucinated chicken. The photo clearly shows 500g pork mince. Copilot said chicken. There was also an imaginary tablespoon of dry sherry, an invisible disappearing act for the onion, celery, carrot, and caster sugar that were printed right there on the page, and the water chestnut quantity tripled from 50g to 150g. Copilot wasn't reading the photo — it was reciting the Platonic ideal of a San Choy Bau recipe from its training data, flavour-adjusted for its vibe of what such a dish might contain.
Every other AI got all 15 ingredients right on the first try. Google AI Mode Pro had one hiccup — it confused the bean sprouts quantity (65g) with the water chestnut quantity (50g) — a single transcription slip that the user could have caught at the confirm step. Everyone else nailed the ingredient list verbatim.
So the first finding is simple and a little brutal: Copilot is the only AI in the matrix whose vision genuinely failed this task. The other seven — free, paid, signed in, anonymous — all read the page.
Fair caveat:the test photo was a phone shot, taken slightly off-angle and in imperfect kitchen lighting — not a scanned flat-bed image. Copilot would likely fare better with a cleaner, straight-on, well-lit photo. Worth noting before anyone accuses us of rigging the matrix. That said, the other seven AIs handled the same imperfect photo fine, which is the real-world standard that matters: if you pull out your phone and snap the page you're cooking from, most modern AIs can read it. Copilot, for now, needs better input conditions than the competition.
Round 2: Voice and personality
Same recipe. Same photo. Here's how each one described the dish:
Copilot:“Crisp butter lettuce cups filled with savoury stir-fried chicken mince and crunchy water chestnuts.”
(We're not over the chicken thing.)
ChatGPT Plus + Thinking:“Crisp lettuce cups filled with a quick pork stir-fry.”
Efficient, accurate, reads like a subeditor on deadline.
Sonnet 4.6:“Crispy lettuce cups filled with savoury pork mince, water chestnuts, and fresh vegetables — fast, fun, and made for sharing.”
Warmer. Notices that this is a shared-plate meal.
Gemini 2.5 Pro (app):“A fast, crunchy pork stir-fry served in crisp, ice-chilled lettuce cups.”
Our favourite phrase of the test: "ice-chilled". You can feel it on your teeth.
Google AI Mode:“A classic Chinese-style pork stir-fry served in crisp, chilled lettuce cups for a fresh and crunchy meal.”
Competent. Reads like a search-result snippet, which, charmingly, it sort of is.
Grok Expert:“Classic Chinese-inspired pork mince stir-fry served in crisp lettuce cups for a fresh, interactive meal.”
"Interactive meal" is a slightly corporate way to describe hand-assembling your own lettuce wraps, but it's not wrong.
Free ChatGPT:“Crisp lettuce cups filled with savoury pork mince and crunchy vegetables.”
Shorter, flatter. The free tier doesn't pay for adjectives.
Meta AI:“Crisp lettuce cups filled with savoury pork mince, crunchy vegetables and a quick soy-oyster glaze.”
Direct, sensory, with a single vivid hook ("soy-oyster glaze").
Claude Opus:“Crisp lettuce cups filled with savoury pork mince, water chestnuts and bean sprouts — a fast, fresh Chinese classic served hands-on at the table.”
"Served hands-on at the table" is the best line any of them produced. It notices the social thing — that this dish is about the moment of everyone reaching in — which is the actual reason to cook it.
Personality leaks out even in a structured task. Opus wins on voice. Gemini is the close second. The free-tier outputs are functional but characterless — you can feel the token budget.
Round 3: Catching what the cookbook missed
Here's where it got interesting. The cookbook had two bugs that had presumably survived editorial, copy-editing, and several reprints:
- The sauce is prepared but never used. Step 2 mixes the soy sauce, oyster sauce, sesame oil and caster sugar. Steps 3, 4, and 5 never add it to the wok. Cook the recipe as printed and you end up with unseasoned mince.
- The oil maths don't add up.Ingredient list says 1½ tablespoons of sunflower oil. Step 3 uses 1 tablespoon; Step 4 uses "remaining 2 teaspoons". That's 1 tbsp + 2 tsp = 1⅔ tablespoons, which is slightly more than the ingredient list.
We didn't warn the AIs. We wanted to see who noticed.
Who caught the sauce bug:
- Claude Opus — flagged it in the sanity check, asked whether to fix, fixed with permission, documented the fix in the import notes. Textbook.
- Meta AI — did exactly the same thing. Flagged the issue. Offered options. Waited for direction. This was the biggest surprise of the test — free-tier Meta AI matching Opus on a subtle reading comprehension task.
- ChatGPT Plus + Thinking — flagged it in the transcription notes but didn't fix it. Kept the recipe strictly verbatim, left the sauce stranded in the prep bowl. Honest, just not helpful.
- Gemini 2.5 Pro and Google AI Mode — both silently added the sauce into a cooking step without disclosing they'd done so. The result works when you cook it, but the transcription notes claimed faithfulness.
- Sonnet, Grok, Free ChatGPT, and Copilot — all missed the sauce bug entirely.
The oil maths showed a similar split. Reasoning-capable models caught it; the rest sailed past. Copilot was too busy inventing chicken to notice.
Round 4: Smart recipe structure
A Surprise Chef smart recipe isn't just a list of ingredients and steps. It has mise en place prep tasks, parallel tracks for when the cooking has multiple components happening at once, and visual cueson each step ("until the onion is translucent") so you know what "done" looks like without a timer.
- Opusproduced two tracks — Lettuce Cups and Filling — and explicitly reasoned in its notes that the 15-minute passive soak for the lettuce should be used for mise en place and sauce mixing in parallel. That's chef-brain reasoning.
- Meta AI matched: two tracks, clean structure, eight prep tasks, smart parallelism.
- Gemini 2.5 Pro went a step further on one dimension — it interleaved the two tracks in the step order, so step 1 (chill lettuce) sits before step 2 (start pork) and step 4 (drain lettuce) falls between cooking steps. You can see the parallelism just by reading the step list.
- Sonnetproduced a compact three-step version that maps 1:1 to the cookbook's printed steps. Conservative, faithful, loses some of the smart-recipe texture.
- Grok and Google AI Modeboth collapsed everything into a single "Main" track. Works, but loses the parallelism story that makes the format useful.
The podium
🥇 Claude Opus with Extended Thinking.Perfect fidelity, caught both cookbook bugs, asked for permission before fixing them, produced the best description, showed the clearest reasoning about parallelism, and didn't once claim it had transcribed faithfully when it had in fact corrected something. Our personal favourite, admitted bias, and the evidence backs it.
🥈 Meta AI (free, login required).The most surprising result of the whole test. Opus-calibre behaviour — catching the sauce bug, surfacing the question, fixing with permission, documenting what it changed. A free Meta account gets you there; if you're on Facebook, Instagram, or WhatsApp already, you're halfway signed up. If you don't pay for an AI, this is your path.
🥉 Gemini 2.5 Pro (free, via the Gemini app).Clean fidelity, actually-parallel track interleaving in the step list, good sensory writing. Silently fixed the cookbook bugs rather than asking, which costs it the top spot, but the cooking result is right.
The rest of the field
- ChatGPT Plus with Thinking — excellent, just not quite as thorough as Opus. Tied-third for paid options.
- Sonnet 4.6 — perfectly faithful but overly conservative. Missed the sauce bug. Good if you want a pure transcription.
- Google AI Mode Pro — fast (30 seconds!), decent fidelity, one vision slip. Free and low-friction.
- Free ChatGPT — worked surprisingly well given no login was required. Uneven smart structure.
- Grok Expert — fidelity good, smart structure shallow, documentation weak.
- Copilot— please cook with someone else. Its vision genuinely can't do this task, and its instinct to fill gaps with training-data pattern-matching turns a reading comprehension exercise into creative writing. The second time around, with the verify-loop prompt, we had to type the ingredient list ourselves before it would produce a correct recipe — so the prompt rescued the output by routing around Copilot's vision entirely.
What this means if you're about to import a recipe
Works for free: Meta AI, Gemini 2.5 Pro (via the Gemini app or Google AI Mode), free ChatGPT, Grok Expert. Any of these will produce a cookable smart recipe from a cookbook photo.
Works best (paid):Claude Opus with Extended Thinking, followed by ChatGPT Plus with Thinking. Reasoning modes earn their premium on tasks like this — they catch subtle bugs the cookbook's own editors missed.
Don't use for photo imports:Copilot. Its vision isn't ready for this. Use one of the others.
The verify-loop prompt (which asks the AI to pause and let you confirm the transcription before generating the recipe) does most of the heavy lifting. It turns a confidence-over-competence AI into one that at least waits for you to catch its mistakes. It's the reason Copilot produced a correct recipe the second time around — we typed the ingredients at the confirm step and routed around its broken vision.
The bigger point
The surprise of this test wasn't which AI won. It was how manyof them did well, and how little the price tag predicted the outcome. Meta AI — free at meta.ai, not even an AI most people think to reach for for recipe work — matched Claude Opus on a careful reading comprehension task. Gemini's free tier rendered actual-parallel cooking timing better than paid Sonnet did. Google's AI Mode, which costs nothing and lives inside a search bar, produced a 30-second smart recipe that's genuinely cookable.
The old story was "use the expensive AI, it's better." The new story is "use the AI that asks good questions." Opus asks the best questions. Meta AI's free tier is right behind it. If a paid subscription was a moat, it isn't any more.
Try it for yourself: dashboard → + New Recipe → Ask any AI. Copy the prompt, paste it into whatever AI you already use, paste the result back, and the recipe is in your kitchen a minute later. Just — maybe not Copilot. Yet.