AI Tools for API Companies: 4 Design Principles for AI-Native API Consumption

In our previous article, we shared how our "obvious" approach to building tools for AI—auto-generating a MCP server from existing documentation and OpenAPI specs—failed spectacularly. After that failure, we had to learn how to design tools specifically for AI consumption.

We observed a fundamental difference between human developers and AI models: AI models have no persistent memory across sessions. When a human first encounters your API, they might struggle with the options, but after a few uses, they internalize what works best. AI models don't. Every conversation is their first time using your API. For example, a breakthrough understanding about when to use country filters versus bounding boxes is lost the moment the chat ends. This fundamental limitation drives every design decision that follows.

In the process of overhauling our MCP server, we developed four guiding principles to help build the best tools for AI as an API company.

Tool Descriptions Are Critical

We learned that tool descriptions do the heavy lifting. The description is your chance to tell the model not just what your tool does, but when and why a model might want to use it. A good description helps the model choose the right tool for the job and understand the context where it's most useful.

Our Static Maps API demonstrates this well. Originally, we described it as:

Generate a map with a marker at a specific location. Returns a PNG image.

This description was technically accurate but too narrow. Models would only use it when explicitly asked for "a map with a marker," missing opportunities where it could create useful visualizations.

Now we describe it as:

Generate a PNG map image of an area, optionally including markers and a line (e.g. to draw a route or a boundary)

The improved description helps models understand the broader capabilities and context where the tool is useful. Instead of only thinking "marker on map," they now consider it for routes, boundaries, and general area visualization.

Context Window Management Matters

We discovered that token consumption is a hidden constraint. Large tool descriptions and verbose responses quickly eat into the available, or perhaps more importantly, usable context window, leaving less room for user input and driving up costs.

The cost of cognitive load taught us to be frugal with our tool descriptions and the number of parameters for each tool. Every word needed to help the model understand when to use the tool or how to interpret the results. The same applies to tool responses. Instead of packing responses with potentially useful but unnecessary elements, each element of the response should be focused on helping the model answer the question.

We believe this merits further investigation. We want to understand how tuning descriptions for specific models could yield even better performance, especially different AI models interpret tool descriptions in different ways.

Optimize Output Formats for AI Interpretation

Our API responses, while great for programmatic consumption, needed rethinking for AI consumption. Instead of just returning raw JSON, we started designing responses that help models interpret and use the data effectively. Sometimes this meant adding human-readable summaries alongside or instead of structured data. Other times it meant restructuring the response to highlight the most relevant information. The goal was always to reduce the cognitive load on the model while preserving the essential information.

Our Timezone API illustrates this perfectly. Originally, we returned the technical data that developers needed:

{
  "tz_id": "Europe/Zurich",
  "base_utc_offset": 3600,
  "dst_offset": 3600
}

This worked fine for developers who could calculate the current local time themselves. But AI models struggled with a surprising limitation: they don't inherently know what time it is right now. When someone asked "Is this restaurant in Zurich open now?", models couldn't reliably determine the current local time from just the timezone offset data.

So we added the local timestamps to our response:

{
  "tz_id": "Europe/Zurich",
  "base_utc_offset": 3600,
  "dst_offset": 3600,
  "timestamp": 1749479378,
  "local_rfc_2822_timestamp": "Mon, 9 Jun 2025 16:29:38 +0200",
  "local_rfc_3389_timestamp": "2025-06-09T16:29:38+02:00"
}

Now AI models can immediately understand both the current local time and work with the format they parse most naturally. This change also made the API more useful for human developers who previously had to calculate local time themselves.

Split Complex Endpoints into Focused Tools

Our biggest breakthrough came from abandoning the one-tool-per-endpoint approach. Instead of exposing our every endpoint as a tool with many parameters, we learned to split them into focused, use-case-specific tools, often removing complexity to ensure models can effectively use each tool.

Take our geocoding tool as an example. Originally, we exposed all the filtering options available in our API—bounding box coordinates, circular search areas, layer filters, and country restrictions. We thought giving AI models more control would lead to better results.

In practice, the models never used most of these options, even when they should have. When they did try to use them, they often got confused about which filter was most appropriate and picked sub-optimal choices. A request to "find coffee shops in downtown Seoul" might trigger attempts to calculate bounding box coordinates rather than simply using the country filter.

We eventually eliminated everything except the country filter. We kept this one because it was the simplest—just a 3-character ISO code rather than coordinate lists for bounding boxes or radius calculations. It also happens that most models can figure out country codes quite easily from their pre-trained general knowledge, turning "find addresses in South Korea" into a simple country: "KOR" parameter.

The key insight: AI models perform better with multiple simple tools than one complex tool. They can chain together focused tools to solve complex problems, but they struggle to navigate a tool with dozens of configuration options. As our tool usage grows, we expect to iteratively add new, focused tools based on our existing endpoints to better help models answer location questions.

From Spectacular Failure to Working Tools

These four principles transformed our MCP server: contextual tool descriptions, focused endpoint splitting, AI-optimized responses, and careful token management. As we discovered them, the spectacular failure we described in our first article became something that actually works. AI models can now successfully help users find routes, geocode addresses, and interact with our location services in natural, intuitive ways.

But mastering the technical aspects of tool design was just the beginning. The real surprises came from what we learned about our own APIs in the process, and the unexpected capabilities that emerged when AI agents started orchestrating our tools in ways we'd never anticipated.

These insights go far beyond tool design and touch on fundamental questions about how API companies should think about AI-native development, developer experience, and business strategy in an agent-driven world.

In our next article, we'll explore how building our MCP server became a mirror that revealed opportunities to improve our underlying APIs, and how it unlocked user workflows we never had to build ourselves. The technical lessons covered here are essential, but they're just the foundation for understanding what AI-native API consumption really means.

For now, if you're building your own tools for AI, start with these principles and prepare to be surprised by what you learn about your own APIs in the process.

Want to see these principles in action? Check out our MCP Server on GitHub and join our Discord to discuss your own experiences with other developers tackling these same challenges.