From 86 Tools to 3 — How a Cloudflare Talk Inspired Us to Rebuild Our BinaryLane MCP

We just rebuilt our BinaryLane MCP server from the ground up. The original had 86 tools — 73 for the BinaryLane API, 13 for SSH. The new version has three. It is faster, uses a fraction of the tokens, and was inspired by a talk we watched at a Cloudflare conference. Here is what we built, why we built it, and what we learned about the future of how AI agents interact with APIs.

The Talk That Changed How We Think About MCP

Sunil Pai from Cloudflare gave a talk called “Code Mode” that laid out a problem we had been feeling but had not articulated. The problem is simple: MCP tool definitions do not scale.

Every tool you register with an MCP server — its name, description, parameter schema — gets stuffed into the model’s context window on every single call. A handful of tools? Fine. But the moment you start covering a real API surface, you hit a wall.

Cloudflare has about 2,600 API endpoints. If they exposed each one as an MCP tool, that is roughly 1.2 million tokens just in tool definitions. Before the model has even thought about your question, a million tokens are gone. The context window is full of schema descriptions instead of your actual problem.

Our BinaryLane MCP was not that extreme — 86 tools, not 2,600. But the same physics apply. Every tool definition eats tokens. Every multi-step operation requires multiple round trips between the model and the server. Want to list your servers, check their metrics, and SSH in to verify disk usage? That is six or seven individual tool calls, each with its own round trip, each burning tokens on the tool-selection dance.

Sunil’s insight was elegant: stop asking the model to pick tools. Ask it to write code instead.

What Code Mode Actually Means

The idea is straightforward. Instead of registering dozens of individual tools, you expose two or three meta-tools. The model generates JavaScript (or Python, or whatever) that calls your API methods directly. The code runs in a sandbox with your API client exposed as a capability.

Cloudflare took their 2,600 endpoints and collapsed them into two tools — search and execute. Search lets the model discover available API methods by querying against the OpenAPI spec. Execute runs the generated code. The result: 1.2 million tokens of tool definitions became roughly 1,000 tokens. A 99.9% reduction.

But it is not just about tokens. Code gives you things that JSON tool calls cannot:

  • Looping. Iterate over all your servers without a separate tool call for each one.
  • Parallelism. Promise.all across 22 servers simultaneously instead of calling them sequentially.
  • State. Store intermediate results in variables, filter, map, transform — all within one execution.
  • Composition. Chain API calls and SSH commands in the same code block. No round trips.

The model is already trained on gigabytes of JavaScript. It knows how to write code. Why make it fill out JSON schemas instead?

Our Implementation: BinaryLane MCP v2

We took Sunil’s approach and applied it to our BinaryLane infrastructure management. The original MCP (v1) had 73 BinaryLane API tools and a separate SSH MCP server with 13 more tools. V2 merges everything into a single server with three tools:

ToolPurpose
searchFind available API methods by natural language query. Returns matching method signatures.
executeRun JavaScript in a sandbox with bl (BinaryLane API) and ssh (SSH client) exposed as globals.
describeGet detailed docs for a specific method — parameters, return types, examples, gotchas.

That is it. Three tools replace 86. The model searches for what it needs, reads the docs if it needs detail, then writes and executes code.

What the Model Generates

Here is a real example. We asked: “list all my servers and get their current health.” The model generated this:

const { servers } = await bl.listServers({ per_page: 200 });

const results = await Promise.all(servers.map(async (s) => {
  const publicIp = s.networks.v4.find(n => n.type === 'public')?.ip_address;

  let health = {};
  try {
    const r = await bl.getServerLatestMetrics(s.id);
    const avg = r.sample_set?.average;
    if (avg) {
      health = {
        cpu: avg.cpu_usage_percent?.toFixed(1) + '%',
        diskUsed: (avg.storage_usage_megabytes / 1024).toFixed(1) + 'GB',
      };
    }
  } catch (e) {}

  return { name: s.name, status: s.status, region: s.region.slug, publicIp, ...health };
}));

return results;

One execution. 22 servers queried with metrics. In v1, this would have been at minimum 23 individual tool calls — list_servers plus get_server_latest_metrics for each of the 22 servers. Each one a round trip to the model and back. Each one burning tokens on tool selection.

With code mode, the model writes one block of code, it runs in under 500 milliseconds, and the result comes back as a single JSON response.

Mixing API and SSH in One Shot

The real power shows when you combine BinaryLane API calls with SSH. Because both clients are exposed in the same sandbox, you can do things like:

const { servers } = await bl.listServers();
const webServers = servers.filter(s => s.name.startsWith('wp-web'));

const results = await Promise.all(webServers.map(async (s) => {
  const uptime = await ssh.run(s.name, 'uptime');
  const disk = await ssh.run(s.name, 'df -h /');
  const nginx = await ssh.run(s.name, 'systemctl is-active nginx');
  return {
    name: s.name,
    uptime: uptime.stdout.trim(),
    disk: disk.stdout.trim(),
    nginx: nginx.stdout.trim(),
  };
}));

return results;

That is an API call to list servers, filtered to web nodes, then SSH into each one to check uptime, disk usage, and nginx status — all in parallel, all in one execution. In v1, this would have required switching between two different MCP servers and making 19 individual tool calls.

The Sandbox

Running model-generated code requires trust boundaries. We built a sandbox using Node.js’s vm module that starts with zero capabilities and explicitly grants only what is needed:

AvailableNot Available
bl.* — 56 BinaryLane API methodsrequire / import — no module loading
ssh.* — SSH commands, file opsfetch — no arbitrary network access
console.log — captured outputfs — no filesystem access
JSON, Promise, Array, Math, etc.process — no environment access
setTimeout (capped)Any outbound network except through bl/ssh

All destructive operations — deleting servers, removing load balancer members, deleting DNS records — are intercepted by a safety layer that audit logs them to stderr before execution. The response includes metadata about which destructive operations were performed, so the model can inform the user.

Every execution has a timeout. Synchronous infinite loops are caught by the vm timeout. Async operations (like a hanging SSH connection) are caught by a Promise.race timeout. The model cannot write code that runs forever.

What We Learned Building This

The implementation was not without its challenges. Node.js’s vm.createContext creates a separate JavaScript realm — which means its own Promise constructor and its own Object prototype. When the BinaryLane API client returns a Promise from the host realm, the sandbox’s await cannot unwrap it. The objects inside those Promises are invisible to the sandbox’s Object.keys().

We solved this by bridging realms explicitly: extracting the sandbox’s Promise and JSON constructors after context creation, then wrapping every API/SSH method to serialize results with the host’s JSON.stringify and deserialize with the sandbox’s JSON.parse. This creates plain objects that live in the sandbox realm. It is not documented anywhere — we figured it out by watching API responses come back as empty {} for three hours.

The other lesson: token loading from config files is fragile. Our config file used api-token (with a hyphen) but our regex matched api_token (with an underscore). The fix was one character. The debugging took longer than the entire sandbox implementation.

The Numbers

MCP v1MCP v2
Tool definitions86 (73 BinaryLane + 13 SSH)3 (search + execute + describe)
MCP servers2 (separate BinaryLane and SSH)1 (unified)
Token overhead (tool schemas)~15,000–20,000 tokens~1,000–2,000 tokens
Round trips for “list servers + metrics”23+ individual tool calls1 code execution
Time for 22-server health checkMultiple sequential calls~500ms (parallel)
Validation layerZod schemas on every inputNone needed (model writes typed code)

What This Means for MCP

We are not saying MCP tools are dead. For simple integrations with a handful of endpoints, individual tools work fine. The model picks the right tool, fills in the parameters, done.

But if you are building an MCP server for a real API surface — anything with more than 20 or 30 endpoints — you should be thinking about code mode. The token economics alone make it worth it. The reduction in round trips makes it dramatically faster. And the composability of code versus JSON schemas means the model can solve problems in one shot that would previously require multi-step orchestration.

Sunil put it well in his talk: for the longest time, programmers got code and infinite power to interact with systems. Everyone else got buttons and forms. LLMs are breaking that distinction. Every person — and every AI agent — now has access to something that can generate code and interact with any system you expose to it.

The question is not whether to let the code do the talking. The question is what capabilities you expose to it, and how you make them safe.

What is Next

BinaryLane MCP v2 is open source and available now. We are running it in production against our 22-server HA WordPress cluster — managing servers, checking metrics, SSH-ing into nodes, and running health checks all through code mode.

Next on the list:

  • Mid-session SSH refresh — re-discovering new BinaryLane servers without restarting the MCP
  • Token usage comparison — formal measurement of v1 vs v2 token consumption across real workflows
  • More detailed method docs — expanding the describe tool’s documentation for edge-case API methods

If you are building MCP servers and hitting the tool-count wall, have a look at what we did. The code is at github.com/termau/binarylane-mcp-v2. And if you want to see the Cloudflare talk that started this, search for “Code Mode — Sunil Pai, Cloudflare” on YouTube.