What is AI agent calibration?

AI agent calibration is the process of building a shared working language between you and an AI agent through structured iteration. It is less about configuring settings and more about progressively expanding what the agent understands about your context, preferences, and standards.

What are the stages of calibrating an AI agent?

The five stages are: first contact (establishing basic communication), building primitives (agreeing on core vocabulary), expanding vocabulary (adding nuance and context), progressive trust (giving the agent more autonomy as it earns it), and invisible harness (the agent operates within understood constraints without constant correction).

How does trust develop between a person and an AI agent?

Trust develops incrementally through repeated cycles of delegation, output review, and correction. The agent learns your standards from your feedback; you learn the agent's capabilities from its outputs. Neither side arrives fully formed at the first session.

Why does AI calibration fail for most businesses?

Most businesses treat AI calibration as a one-time setup rather than an ongoing relationship. They configure the tool once, get poor outputs, and blame the model. The article argues that sustained iteration, similar to how Ryland Grace and Rocky built communication in Project Hail Mary, is what produces reliable results.

What a Five-Legged Alien Taught Me About Calibrating AI Agents

The best sci-fi book about AI isn’t about AI at all.

There’s a scene early in Andy Weir’s Project Hail Mary that I think about constantly when I’m working with AI agents.

Ryland Grace, a lone astronaut billions of miles from Earth, discovers he’s not alone. An alien ship is parked next to his. He can’t see the creature inside. He doesn’t know its biology, its language, its intentions. So he does the only thing he can: he plays a musical note through the hull.

And something plays one back.

Then a different note. Then another. Days of this, just tones back and forth, before a single meaningful word is exchanged. If you’ve read the book, you know what comes next: one of the most satisfying communication arcs in modern science fiction. If you haven’t, you’re about to get a reason to pick it up. It’s in cinemas now. Ryan Gosling as Grace, Phil Lord and Christopher Miller directing, released March 20, 2026.

But here’s why I keep coming back to it: what Grace and Rocky go through to communicate is exactly what businesses go through (or should go through) when deploying AI agents. And most of them skip the hard part.

Building a Language That Didn’t Exist Before

Grace and Rocky don’t learn each other’s language. That’s the critical detail. Rocky communicates in musical chords. Grace speaks English. Neither is going to become fluent in the other’s native tongue. So they do something far more interesting: they build a third language from scratch. A pidgin: a simplified, purpose-built communication system that belongs to neither of them but works for both.

They start with numbers. Repeated taps. Then physical objects: hold something up, assign it a chord or a word. Verbs come later and are much harder (try miming “hypothesis” to an alien who doesn’t distinguish between “I think this might be true” and “I tested this and it’s confirmed”). Over weeks, they go from single tones to bantering, in-jokes, and collaborative science.

Rocky even develops a habit of prefixing every question with a single word, essentially announcing “what follows is a question, not a statement.” A structured protocol for signalling intent. Sound familiar?

Rocky invented chain-of-thought prompting before OpenAI had a logo.

This Is What Agent Calibration Actually Looks Like

When a business deploys an AI agent (whether it’s handling customer queries, drafting documents, managing workflows), the instinct is to treat it like software configuration. Write some instructions, press go, wonder why it’s producing nonsense.

But what you’re actually doing is the same thing Grace did. You’re building a shared language with a non-human intelligence. The system prompt that emerges from that process isn’t English. It isn’t code. It’s a pidgin: a third language that you and the AI have constructed together, through iteration, testing, and a fair amount of patience.

And like Grace and Rocky, you have to go through the stages. There are no shortcuts.

The Five Stages of Agent Calibration

First contact: getting a response at all. Grace plays a note. Rocky plays it back. In AI terms, this is your first prompt, just seeing if the thing can parse what you’re asking. Can it respond? Does it understand the domain? You’re probing capabilities, nothing more. Most people try to skip this and jump straight to complex tasks. Grace didn’t. He spent days on tones.

Building primitives: establishing what it can and can’t do. Grace and Rocky build up from numbers to objects to actions. With an agent, this means establishing the basics: what tools does it have access to?? What format should it respond in? What does it do when it doesn’t know something? You’re pointing at things and naming them, exactly like Grace holding up a rock and assigning it a word.

Expanding vocabulary: system prompts, tools, structured outputs. This is where the pidgin starts to take shape. Grace and Rocky move from concrete objects to abstract concepts: science, temperature, pressure, hypothesis. With an agent, you’re moving from simple question-and-answer to structured outputs, function calling, multi-step workflows. You’re teaching it not just what to do, but how to tell you what it’s doing. Rocky’s habit of signalling intent before asking a question is exactly what modern agent frameworks call structured outputs: a protocol for making communication predictable and parseable.

Progressive trust: starting small, expanding autonomy. Grace didn’t hand Rocky the controls to his ship on day one. They started with safe, reversible exchanges. Information first. Then shared tools. Then joint experiments. Then, eventually, trust with life-or-death decisions. With AI agents, the pattern is identical: start with low-stakes tasks where mistakes are cheap. Let the agent draft an email, not send it. Let it suggest a response, not commit to one. Expand autonomy as your confidence in the calibration grows. The businesses that skip this step are the ones posting horror stories on LinkedIn about AI gone wrong.

The harness becomes invisible: when it just works. By the second half of the book, Grace and Rocky aren’t thinking about how to communicate. They’re thinking about what to communicate: the science, the problem, the solution. The language has become infrastructure. Invisible. A well-calibrated AI agent hits the same point. The system prompt, the tool configuration, the guardrails: they all disappear into the background. What’s left is an agent that looks effortless. But that effortlessness was built, tone by tone, over dozens of iterations.

Where the Metaphor Breaks Down

I’d be doing you a disservice if I didn’t flag where this analogy stops working, because the distinction matters.

Rocky had agency. He chose to collaborate. He could refuse, disagree, push back. Current AI agents don’t do any of those things; they operate within the constraints you set, and they don’t have preferences or intentions. Grace and Rocky formed a genuine friendship; agent calibration is, for now, a one-directional relationship. You’re shaping the communication. The AI isn’t meeting you halfway in any meaningful sense. It’s responding to patterns.

There’s also the timescale. Grace and Rocky needed weeks of patience. Agent calibration can show results in hours, though genuine mastery (that sense where the harness becomes invisible) still takes months of refinement.

Being honest about these limits isn’t pessimism. It’s good engineering. Understanding what AI agents aren’t is just as important as understanding what they are.

The Payoff

Here’s what I see over and over again working with businesses across the UK: the ones getting real value from AI aren’t the ones with the fanciest models or the biggest budgets. They’re the ones who invested in the communication work upfront.

They spent the time on tones. They built the primitives. They tested, iterated, expanded trust gradually, and let the pidgin emerge. They treated calibration as a process, not a purchase.

The ones struggling? They expected the agent to just work. They wrote a paragraph of instructions, pointed it at a task, and were surprised when it produced something useless. They skipped the Grace-and-Rocky bit and went straight to expecting fluency.

When millions of people watching the Project Hail Mary film right now they’ll see something beautiful: two utterly alien intelligences building a working language from nothing, through patience and iteration. They’ll watch Grace tap on a hull and Rocky tap back, and they’ll feel the satisfaction of that first fragile connection becoming something powerful.

That’s what good agent calibration feels like. Not magic. Not configuration. Communication.

If you’re deploying AI in your business and want to skip the months of hull-tapping, we’ve done this before. We can help you build the pidgin.

Jon Gill is the founder of Squared Lemons, helping UK businesses build AI systems that actually work.