Perspectives

Sail: The Inference Platform for Long-Horizon Agents

June 25, 2026

Inference is already a $100B+ market and one of the fastest growing areas of AI spend. As AI moves from demos into production, the amount of compute required to run intelligence at scale is growing quickly, with agentic workloads driving much of that demand. 

The next wave of AI isn’t a person typing into a box. It’s agents performing work end to end –  working across codebases, data pipelines, compliance checks, and research tasks, running asynchronously for minutes, hours, or days before returning an answer. These deep, branching workflows are burning orders of magnitude more tokens per task. When token volume explodes like that, latency stops being the bottleneck. Cost and scale become everything. We’re already seeing signs of the tokenmaxxing era running into budget caps, leaving tons of demand for intelligence unmet. Businesses simply cannot afford their inference bills in the age of agents. 

What if inference were rebuilt from the ground up for the way agents actually compute? 

Enter Sail Research.

Founded by Neil Movva and Samir Menon, Sail is the inference platform purpose built for long-horizon, asynchronous agents. When we first dug in with the team, two things became clear:

  1. Serving agents efficiently at scale is a deep systems problem, not a faster-endpoint problem. It demands optimization across the entire stack, from CUDA kernels up to the API.
  2. The team’s backgrounds are exactly what that problem requires. Neil was the first kernels hire at Together AI, with deep roots in inference systems and AI accelerator design previously at Nvidia and Apple. Across our references and our own interactions, his technical ability and instinct for where the market is going stood out consistently. While Neil was considered one of the best engineers at Together, Neil calls his co-founder Samir the strongest programmer he met at Stanford. It was clear to us that the two of them are an exceptional duo.

We believe inference is fragmenting into platforms optimized for the shape of a specific workload. The highest growth workload of all is agents, and two tailwinds make Sail’s position stronger: compute is going heterogeneous, with no single stack optimal for every phase of an agent’s work, and more cost-efficient open-source models are getting good enough for enterprise adoption. In a fragmented compute world, the durable value accrues to the layer that decides where each workload runs and still delivers the performance the application needs. That routing and scheduling intelligence is core to Sail.

In time, Sail will be the agent cloud by providing both optimized inference and flexible compute (check out Sailboxes!), required to build and run long-horizon agents end to end. 

We’re thrilled to partner with Neil, Samir, and the Sail team. In a world where agents are about to become a primary unit of software, the infrastructure that powers them will be foundational, and we believe Sail is building it.

They’re hiring. Join them in shaping what comes next in this era of agents!