Prompts to Avoid the Bitter Lesson

The Bitter Lesson is this concept from AI research that general methods that use more compute tend to win over time.

Coding agents violate this concept consistently.

They naturally write regex instead of API calls to LLM providers. Codex and Claude Code see sample data, write a brittle parser, and accidentally overfit your app to the examples in the prompt.

My theory is this comes from how coding agents are trained and evaluated. In a training loop, you probably do not want generated code calling out to another LLM: it is slow, expensive, flaky, and often unavailable in the sandbox. A regex is cheap, local, deterministic, and easy to test.

When the reward signal favors passing the visible tests quickly, the agent learns to reach for the thing that works in that environment, even if the product architecture should use a model. Not to mention, everyone is compute constrained right now — why use more GPU than needed?

And regex is the trap.

As models get better, the LLM call improves. The regex stays stuck to the examples you happened to provide.

Instead, tell the coding agent to outsource thinking to the LLM in the app code it writes.

If the app needs classification, routing, tagging, extraction, prioritization, or intent detection, use an LLM with a typed schema. Keep deterministic code for permissions, retries, idempotency, audit logs, and side effects.

This often looks like writing a classifier with structured outputs using the AI SDK and a cheaper, faster model. The Gemini 3.1 Flash-Lite Preview docs list gemini-3.1-flash-lite-preview with a 1,048,576-token input limit and structured outputs support.

import { generateText, Output } from 'ai'
import { z } from 'zod'

const LeadRoute = z.object({
  route: z.enum(['sales', 'support', 'partner', 'spam', 'unknown']),
  confidence: z.number().min(0).max(1),
  reason: z.string(),
})

const { output } = await generateText({
  model: 'google/gemini-3.1-flash-lite-preview',
  output: Output.object({ schema: LeadRoute }),
  system: 'Classify this inbound lead. Do not invent facts.',
  prompt: JSON.stringify(lead),
})

The schema gives the app a deterministic contract. The model handles the semantic decision.

The prompt

Here is the instruction I have started giving agents:

Always be aware of the bitter lesson and apply it to software architecture as well.

If you find yourself at a point where you realize you need some intelligence
to make a decision, whether that be classification, routing, tagging, or
anything of that nature, never use regular expressions or another hack like
that.

Simply outsource that thinking to an LLM.

If you are worried about cost, use Gemini Flash-Lite models. If you are not worried
about cost and are opting for intelligence, use a larger model on higher
thinking like Claude Opus 4.7 or GPT-5.5.

Never write a regular expression without explicit approval.

Be careful to not overfit your solutions to the provided sample data.

Trust the LLM.

The goal is to make the generated code benefit from the LLM exponential. Billions of dollars are going into making these models smarter and cheaper. A regex does not ride that curve. An LLM-backed classifier does.

The prompt

Don’t miss the next one.