͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Hey, Luca here, welcome to a new edition of Refactoring! Every week you get:

💡 Monday Ideas (free) • short ideas to start the week right
🧵 Wednesday Essay (paid) • deep-researched advice about your work
🎙️ Friday Podcast (free+paid) • interviews with world-class tech leaders

Here are the latest editions you may have missed:

The Engineering Manager Archetypes 📊

Luca Rossi and Thiago Ghisi

Jun 4

Read full story

How Code Reviews are Changing with AI 🔍

Luca Rossi

May 28

Read full story

Shifting from Tech Debt to Tech Capital 🏦

Luca Rossi and Aviv Ben-Yosef

May 21

Read full story

To access all our articles, library, and community, subscribe to the paid version:

Resources: 🏛️ Library • 💬 Community • 🎙️ Podcast • ❓ About

How to Design APIs for an AI World 🔌

A thorough analysis of how AI changes what is needed from a good API, with real-world examples + some speculations about the future.

Luca Rossi

Jun 11

READ IN APP

Listen to post · 21:56

As engineers, we've been building applications for different audiences throughout the decades.

At first, we only built for humans — creating visual interfaces, buttons, and forms. Then came the APIs, and we started building for humans and other software — predictable systems talking to each other through well-defined contracts.

Now it feels like we are entering a third era, where a relevant consumer is neither human nor traditional software. It's AI.

And here is the thing: AI agents don't behave like either of their predecessors. On one side, AI looks like regular software: LLM calls are fundamentally API calls, and our basic mental models about load, latency, and cost, still apply.

On the other side, AI brings a degree of understanding and… chaos that, in many respects, makes them closer to how humans do things. They can:

Figure out when (and to some degree, how) to use tools by themselves.
Make sense of unstructured data — they don’t need strict syntax (e.g. JSON) to work.
Make unpredictable mistakes and are inherently non-deterministic.

It is obvious that to get the most value out of AI, we need to ensure it can interact with the rest of our tech stack. Advancements like MCP have driven progress, but it’s also clear that simply piping LLMs to APIs in the same ways we always have won’t get us the most value out of this collaboration. We need to rethink how we approach API design with a new consumer in mind: one that is neither software nor human. Because, in a way, AI agents are both.

While we are firmly in the realm of speculation here, I believe we have been working with AI for long enough to be able to set some coordinates that are unlikely to change, and consider what AI-first APIs may look like.

To help me with this, I am also bringing in none other than Ankit Sobti, co-founder and CTO of Postman. Postman is the world’s leading API platform, and Ankit has been working at the forefront of this space for more than 10 years.

So here is what we will cover today:

✨ Thriving in ambiguity — an unprecedented reality for software systems.
🤖 Designing APIs for AI — exploring AI's unique quirks and how to use them for good.
🔌 MCP — the current state of AI-first API protocols.
🔮 Beyond APIs — preparing for an agent-to-agent future.

Let's dive in!

Disclaimer: I am a fan of what Ankit and the team are building at Postman, and I am grateful they partnered on this piece. However, I will only write my unbiased opinion about the practices and tools covered here, including Postman.

Learn more about their new MCP Catalog and MCP Generator below 👇

Learn more

✨ Thriving in ambiguity

Every major shift in how we build interfaces has been about identifying and removing friction.

The transition from human-first to API-first wasn't just about mobile apps or microservices — it was about recognizing that human interpretation was a bottleneck.

We moved from clicking buttons to calling endpoints to eliminate the need for a human to translate intent into action.

Traditional APIs are contracts between deterministic systems. They're the software equivalent of a vending machine: insert exact change, press B4, get your snack. The entire API-first movement is built on the premise of bottling intent into a predictable process.

Now, in a way, AI is bringing human-like interpretation back. When an LLM calls an API, it's not following a hard-coded path — it's reasoning (kinda) about what to do.

This creates a paradox: we spent two decades removing ambiguity from our interfaces, and now our newest consumers thrive on understanding ambiguity. It’s literally their superpower: they can parse poorly formatted JSON, understand inconsistent field names, guess what undocumented params probably do, and move past all kinds of grammar/syntax mistakes.

So the question becomes: how do we design systems that make the most out of this?

AI is a weird middle ground between humans and regular software

🤖 Designing APIs for AI

AI operates under different constraints and capabilities than traditional software. These, in turn, are going to change API design.

As engineers, here are some topics we now need to take into account:

💰 Token economy — Every byte costs money when processed by LLMs.
⏱️ High latency — AI calls are 10-100x slower than traditional API calls.
🔄 Self-healing behavior — AI can retry and adapt when things go wrong.
🎲 Non-deterministic ops — The same input doesn't guarantee same behavior.
📚 Living documentation — Your docs become part of the runtime, not just a reference.

Let's explore each of these and their engineering implications:

Agents and AI have wildly different capabilities than regular software, which in turn leads to different design principles.

1) Token economy 💰

Here's a constraint traditional APIs have (largely) never faced: every byte costs money. When an LLM processes your API response, those tokens aren't free.

So, for example, should field names be customerAccountIdentifier or just custId? That's roughly 4 tokens vs 2. Not dramatic for a single field, but multiply that across dozens of fields, thousands of calls per day, and you're looking at real costs.

On the other hand, terse, abbreviated responses lose the self-documenting nature that helps AI understand APIs. You're optimizing for both comprehension and compression — a tradeoff that didn't exist before.

To navigate this, some teams are experimenting with:

🎚️ Adaptive verbosity — PayPal's API accepts a VERBOSITY parameter that controls response detail. ML libraries like cuML use "adaptive" modes that adjust output based on context
🔗 Schema references — Similar to how Anthropic and Google implemented prompt caching to reuse token sequences, APIs can reference cached schemas instead of repeating structures.
🗜️ Compression-friendly formats — Studies show Markdown is 15% more token-efficient than JSON, while TSV uses half the tokens. Some teams are exploring MessagePack or custom formats for AI consumption. Will JSON survive? As insane as this question seems today, the answer is not obvious.

2) High latency ⏱️

Traditional API calls are fast — milliseconds. LLMs are slow — seconds. This completely changes how you think about API orchestration.

Consider a typical AI workflow:

LLM processes user request → 1-2 seconds
LLM calls your API → 100ms
LLM processes response → 1-2 seconds
LLM might need another API call → repeat

What used to be sub-second can now easily become 5-10 seconds. This has real implications:

📦 Batch everything you can — Design endpoints that return related data together instead of requiring multiple round trips. E.g. Azure OpenAI recommends it explicitly, and LiteLLM provides batch completion APIs specifically to reduce latency for AI workloads.
🔮 Predictive responses — Include likely next-step data in responses to avoid follow-up calls. E.g. Mistral AI mentions this approach for updating documents with minimal changes while reducing latency.
🌊 Streaming becomes critical — For long operations, streaming partial results keeps the AI (and user) engaged. E.g. Google's Live API enables real-time bidirectional streaming. Voiceflow and others are building streaming-first APIs specifically for LLM interactions to keep users engaged during long operations.

3) Self-healing error handling 🔄

When traditional systems hit an error, they just fail. When AI hits an error, things get interesting. We are seeing AI consumers that:

Retry with creatively modified parameters
Switch to alternative endpoints that might work
Infer the problem from error messages and self-correct

This adaptability is chaotic but potentially powerful, and API design can take advantage of it:

Recent work by Pol Avec on AI agent error handling shows how descriptive error messages drive faster agent recovery than minimal machine-formatted responses.
Stytch also found that APIs optimized for AI agents often work better for human developers too.

Traditional error response:

{"error": "Invalid date format", "code": 400}

AI-optimized error response:

{
  "error": "Invalid date format",
  "code": 400,
  "hint": "Use ISO-8601 format: YYYY-MM-DD",
  "example": "2024-03-15",
  "alternativeEndpoint": "/v2/orders?date_format=flexible"
}

In the second example, the additional info would be too difficult for regular software to parse—you’d need to code all the possible cases. Meanwhile, AI can actually act on it pretty easily. It might retry with the correct format, try the alternative endpoint, or explain the issue to the user. So, in this case, you are not just reporting errors: you are enabling recovery.

4) Non-Deterministic Operations 🎲

Traditional APIs assume clients behave predictably. AI does not. An LLM might:

“Forget” context between calls
Call endpoints in unexpected order
Attempt operations without proper setup
Interpret the same response differently based on conversational context

This forces a rethink of API statefulness and control:

🔁 Idempotency everywhere — This is a general best practice for robust APIs, but it becomes even more critical with AI. As AWS notes in its Builders' Library, making retries safe with idempotent APIs is key. AI agents are prone to retrying or re-issuing commands, and your API must handle this gracefully without unintended side effects.
📦 Self-contained operations — Each endpoint should work with minimal assumed context. While not a specific "AI API" feature, this principle, often discussed in the context of microservices and self-contained systems, becomes vital when the "client" (the AI) might lose track of conversational state.
📊 Explicit state in responses — While there isn't a widespread standard for "AI state responses" yet, the need for LLMs to maintain state is clear. Frameworks for building AI agents, like LangGraph, encourage passing clear state indicators and conversation history to the LLM for it to make decisions.

Finally, a single AI conversation might spawn dozens of parallel API calls. Traditional rate limiting breaks down when your "single user" is actually an LLM orchestrating complex flows.

So teams are experimenting with:

🧠 Semantic rate limiting — Limits based on operation cost, not just call count.
💥 Burst allowances — High parallel limits with lower sustained rates.

For example, Amazon SageMaker offers options like "provisioned concurrency" to handle predictable bursts, a pattern that could evolve into more nuanced "semantic" or "burst-allowance" rate limiting tailored for AI agents' parallel thinking.

5) Documentation as Runtime 📚

Your documentation is no longer just developer guidance — it's part of your operational system.

Unlike humans who read docs once and internalize patterns, AI processes documentation with nearly every decision. This changes what documentation means:

📊 Docs affect production behavior — If your example shows retrying 3 times, AI may retry exactly 3 times. Every example becomes a de facto configuration.
🔄 Inconsistencies cause real failures — A mismatch between docs and actual API behavior doesn't just confuse developers: it causes AI to make incorrect calls in production.
📈 Documentation becomes a scaling bottleneck — As AI makes thousands of calls, it may reference your docs repeatedly, making them a potential performance (and cost!) consideration.

This shift means documentation requires the same rigor as code:

🧪 Test your examples — They're not just illustrations; they're templates that AI will follow precisely.
🔒 Version control your docs — Changes to documentation can break AI integrations just like API changes can.
📊 Monitor doc usage — Understanding which examples AI references most can inform both documentation and API design.

This is a brave new world, and some companies are starting to adapt. For example,. Stripe's new agent toolkit explicitly acknowledges how AI interprets their documentation and examples.

🔗 MCP — the current state of AI-first APIs

How does all of this relate to MCP, the emerging standard on how AI should use APIs and services?

Actually, not a lot.

MCP is about the connection problem rather than the API design challenge. It's plumbing, not an architectural overhaul. But let’s look at it briefly 👇

1) What is MCP?

MCP is an open protocol introduced by Anthropic that standardizes how AI assistants connect with external data sources and tools. Think of it as a universal adapter that lets AI systems interact with any service—from databases to APIs to local files—through a consistent interface.

At its core, MCP defines three things:

🔌 Servers — programs that expose data and functionality.
🤖 Clients — AI applications (like Claude) that want to use these resources.
📡 Protocol — standardized communication between them.

At its core, it’s pretty simple. Instead of every AI tool creating custom integrations for Slack, GitHub, or your internal database, developers write an MCP server once, and any MCP-compatible AI can use it.

Small plug: Postman recently launched an MCP catalog to showcase these integrations, making it easier for developers to find and share MCP servers. They also debuted their own MCP generator to easily create MCP servers.

2) How MCP addresses our API challenges

Looking back at the problems we identified, MCP actually tackles several key challenges, including:

Discovery and capabilities — MCP servers explicitly declare what they can do, solving the "how does AI know what's available" problem
Standardized errors — the protocol defines consistent error formats that AI can interpret
Stateless operations — each MCP call is self-contained, aligning with AI's non-deterministic nature

But it doesn’t address others, such as:

Token economy — MCP doesn't specify (or recommend) token-efficient formats: verbose responses remain verbose
Latency optimization — no built-in batching or streaming optimizations for slow LLM workflows
Self-healing hints — while errors are standardized, MCP doesn't provide standard avenues for recovery suggestions or alternative endpoints

To me, this is 100% ok, because 1) it is unclear whether the scope of MCP should include all or any of these (and when in doubt, always better to go with a smaller scope), and 2) it’s more important to converge on a standard than to design a perfect standard.

Fortunately, it feels we are converging on MCP pretty fast, which suggests the industry was quite hungry for it.

🔮 Beyond APIs — the agent-to-agent future

We've spent this entire article assuming APIs are still... APIs.

We are acknowledging our new consumers are smart, but we are still assuming our APIs are dumb: fixed endpoints, predetermined schemas, static docs.

Explicit state in responses, good batching, or semantic rate limiting, feel like good incremental improvements, but do they feel AI-native? If you ask me, not really.

What is AI-native, then? No one knows for sure, and any speculation tends to age extremely poorly, but consider this: if we are putting intelligence on the consumer side (LLMs calling APIs), it feels natural to me, at some point, to have intelligence on the provider side too.

Most of the constraints and capabilities we discussed today are about adaptation, flexibility, and non-determinism. The workarounds we discussed (e.g. explicit state, retry instructions, verbose errors) are not bad but they feel like ways to constrain this non-determinism — to predict where it can go and create systems around it.

But this is the old way.

The end game might be, instead, to have providers that are just as flexible as consumers, and can adapt on the spot: adapt to constraints on latency and cost, adapt to non-deterministic sequences of calls, and help with self-healing by problem-solving with the other party.

This is largely what Google is envisioning and designing for with the Agent2Agent protocol. A critical quote from Google👇

A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents. […]
A2A focuses on enabling agents to collaborate in their natural, unstructured modalities, even when they don’t share memory, tools and context. We are enabling true multi-agent scenarios without limiting an agent to a “tool.”

So, this agent-to-agent future isn't just hypothetical, but rather likely, if we reflect on all of this from first principles.

I believe that, as good engineers, we need to design by keeping a foot in the present (MCPs and incremental improvements we discussed), and a foot in the future (A2A), accounting for what may come next.

📌 Bottom line

And that's it for today! We've covered a lot of ground — from the evolution of application interfaces to the emerging agent-to-agent future. Here are some takeaways:

🌊 Interfaces are evolving again — From human-first UIs to API-first for deterministic software, we're now entering the AI-first era where consumers have both machine precision and human-like understanding.
💰 Token economics changes everything — Every byte costs money when processed by LLMs, creating new design pressures between verbosity and clarity that didn't exist before.
⏱️ Latency compounds differently — AI's multi-second processing times force us to rethink API orchestration, favoring batching, predictive responses, and streaming.
🔄 Self-healing becomes possible — AI consumers can retry creatively and work around errors, but only if APIs provide rich, actionable error information.
📚 Docs become operational — Your docs aren't just guides anymore; they directly shape production behavior at scale, requiring the same rigor as code.
🔗 MCP is the first attempt at standardization — While still early and limited, it represents the industry's recognition that AI needs different API patterns.
🤖 The future is agent-to-agent — We're likely moving from making APIs AI-consumable to creating intelligent endpoints that can negotiate and collaborate in real-time.

So whether you're building APIs today or planning for tomorrow, the message is clear: the age of static, rigid interfaces is ending. The future belongs to systems that can think, adapt, and collaborate.

Time to start building them!

See you next week 👋

Sincerely
Luca

Comment

Restack