Why Direct Inference
The endpoint that does the model market so you don’t.
The old way means choosing a model for every task, building retries and failover, and re-touching code each time the backend moves. Direct Inference does all of it behind one endpoint: the best model is served on every request and your existing code keeps working unchanged. That simplicity is the product — your integration stays trivial while the model market churns.
Three ways to get a model
Even the smart routers still hand you a picker.
Choosing a model used to mean wiring it yourself. The newest tools choose for you — but only after you turn on a router, write routing rules, and live in a model picker. Direct Inference is the step past that: there is nothing to choose, enable, or configure.
The old way
Wire it yourself
Pick a model for every task, build your own retries and failover, and re-touch code each time the market moves.
You own the model matrix, the plumbing, and every migration.
The current wave
Add a smart router
A router picks a model for you — once you enable it, configure routing rules, and select it from a model picker.
Choosing is faster, but it's still a router to turn on, rules to maintain, and a picker to live in.
Direct Inference
Stop choosing entirely
One endpoint covers your use cases the way a frontier lab does. No model to pick, no router to enable, no rules to write.
Nothing to configure. Change one line and one key — you're done.
The old way, in detail
Wire it yourself vs. one endpoint
Same outcome you’d hand-build — without building, configuring, or maintaining any of it.
The advantage
Not choosing is the feature, not a limitation.
Letting the model decision go isn't something you give up — it's what you gain: less to maintain, more we can optimize on your behalf, and one endpoint that doesn't drift out from under you.
An integration that can't drift
There are no model names in your code to go stale, so a rename or retirement upstream can't quietly break a branch you forgot you wrote.
We optimize so you don't have to
Because we choose per request, we can move traffic for quality, latency, price, and availability on your behalf — no slider to tune, no migration to run.
One endpoint, not a shopping list
You commit to one durable endpoint instead of to any single lab's release cycle. Keeping up with the model market stays our job, not yours.
Operate with confidence
You still see everything that's yours.
You never have to track which model served a request — and everything else is fully visible: usage, costs, request mix, and per-application breakdowns, with hard caps you control.
Usage by workload
See how your traffic splits across the kinds of work you send — chat, documents, vision, code, reasoning — so cost and volume break down by what you're actually doing.
Per-application attribution
Traffic segments by application automatically from your request headers, so one key can power many surfaces and still break down cleanly.
Request traces
Inspect individual requests — tokens, latency, cost, and the detected request type — for the debugging visibility production actually needs.
Hard spend caps
Per-key and account-level ceilings are enforced in the request path. Past the cap, spend fails closed instead of running up a bill.
See what each call needs
The playground shows, in real time, how each request is handled — so you can watch the endpoint do the work you no longer have to.
Pay-as-you-go balance
Top up with a card and draw it down per request, with a low-balance signal before anything stalls. No seats, no minimum, no contract.
Durability
A surface that outlasts the model market.
Improves without your involvement
Each new capable model can be folded in behind the endpoint. You inherit the upgrade without a migration, a model swap, or a release-note review.
Absorbs churn instead of forwarding it
Renames, retirements, price changes, and outages are ours to absorb — not new branches in your application code.
No lock-in to any one lab
One endpoint speaks the OpenAI, Anthropic, and Gemini SDK shapes, so your product never rides a single vendor's release cycle.
Stop integrating against the model market.
Point one client at one endpoint and let the backend stay our problem. Your existing code keeps working untouched; the churn stays on our side of the line.