Developers
Drop in one endpoint. Get inference that feels engineered.
The DI Model is a stable endpoint for OpenAI, Anthropic, and Gemini SDKs. Change the base URL, keep your existing model string, and each request is understood, classified, and fulfilled for you.
https://app.directinference.com/di/v1Quickstart
Three steps to your first request
Create a key
Sign in to the portal and issue a Direct Inference API key.
Point at the endpoint
Set your SDK base URL to the DI Model endpoint. Nothing else in the call shape changes.
Keep your model string
Send the model id your app already sends. Direct Inference handles capability behind the same response shape.
OpenAI SDK (Python)
from openai import OpenAI
client = OpenAI(
base_url="https://app.directinference.com/di/v1",
api_key="YOUR_DIRECT_INFERENCE_KEY",
)
# Keep sending whatever model string your app already sends.
resp = client.chat.completions.create(
model="gpt-5.5-mini",
messages=[{"role": "user", "content": "Summarize this thread."}],
)
print(resp.model) # -> "gpt-5.5-mini" (your id, echoed back)Anthropic SDK (Python)
from anthropic import Anthropic
client = Anthropic(
base_url="https://app.directinference.com/di/v1",
api_key="YOUR_DIRECT_INFERENCE_KEY",
)
# Same endpoint speaks the Anthropic Messages shape too.
msg = client.messages.create(
model="claude-haiku",
max_tokens=512,
messages=[{"role": "user", "content": "Extract the action items."}],
)Gemini SDK (Python)
from google import genai
client = genai.Client(
api_key="YOUR_DIRECT_INFERENCE_KEY",
http_options={"base_url": "https://app.directinference.com/di/v1"},
)
# Keep your Gemini model id — Direct Inference handles the capability.
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="Summarize this thread.",
)
print(resp.text)The effort hint
One optional knob for cost vs. quality
Effort is a hint, not homework. Request shape still decides the needed capability; effort tunes the serving choice within it and keeps cross-provider reasoning controls aligned. Send it as a header, a query param, or your SDK's native reasoning field — medium is the default.
Per-request effort
# Effort is an optional per-request hint. Medium is the default.
resp = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Plan a database migration."}],
extra_headers={"X-DI-Effort": "high"},
)
# Or per request via query string:
# POST https://app.directinference.com/di/v1/chat/completions?effort=minimalEffort levels
- minimal
- Lowest latency, minimal thinking budget, simple work.
- low
- Light reasoning, concise answers.
- medium
- Balanced default behavior.
- high
- Deeper reasoning and more careful synthesis.
- xhigh
- Maximum reasoning budget for the hardest requests.
Request types
What gets detected, and what it triggers
Every call is classified by its shape. Capability always outranks the model name — so a document or image request gets a capable model regardless of the id or effort you send.
| Request type | Detected from | What it means |
|---|---|---|
| vision | Image content in the request | Handled by a vision-capable model — even if the model string says “mini”. |
| document | PDF or file input | Document-capable processing handles the file, regardless of the requested model. |
| long | Input beyond the standard context window | Uses a long-context path so nothing gets silently truncated. |
| code | Tool definitions, diffs, stack traces, repo paths | Code-shaped traffic gets tools-and-reasoning strength. |
| json | A response or output JSON schema is set | Structured-output requests use a model reliable at schema adherence. |
| reason | Multi-step reasoning in the prompt | Hard, multi-step problems are sent to a reasoning model. |
| flash | Simple request at low effort | Trivial traffic stays fast and cheap — where most of your margin hides. |
| pro | Everything else (default) | General requests land on a strong all-rounder. |
Built for coding agents
Drop Direct Inference into your agent.
Coding tools and agents are first-class clients. Point the base URL at Direct Inference and let machine-readable docs handle the rest — no plugin, no adapter, no bespoke client.
Any OpenAI-compatible tool
# Point any OpenAI-compatible coding tool at Direct Inference.
# (Same idea for Anthropic- or Gemini-compatible tools.)
export OPENAI_BASE_URL="https://app.directinference.com/di/v1"
export OPENAI_API_KEY="YOUR_DIRECT_INFERENCE_KEY"
# Machine-readable for agents:
# https://directinference.com/llms.txt (concise)
# https://directinference.com/llms-full.txt (full)Machine-readable docs
A concise /llms.txt and a full /llms-full.txt let a coding agent read the whole integration — base URL, SDK shapes, request types, effort — and wire it up itself.
Point the base URL and go
Any OpenAI-, Anthropic-, or Gemini-compatible coding tool works by changing one base URL. No plugin, no adapter, no bespoke client.
One key across your stack
The same Direct Inference key powers your editor, your agents, and your production app — with usage and caps visible across all of them.
Compatibility
Guarantees you can build on
One line, one key, no rewrite
Point your existing client at one base URL and set your key. Your SDK, your calls, and your logging keep working as-is — there's nothing to re-architect.
No more deprecation fire drills
When a model is renamed or retired upstream, nothing in your code breaks and there's no migration to run — the endpoint keeps serving the same use cases.
Three SDK shapes, one endpoint
Point an OpenAI-, Anthropic-, or Gemini-compatible client at the same base URL — streaming, tool use, and structured output all pass through.
Capability outranks the name
A PDF sent to a “mini” model still gets a document-capable model. The request decides, not the string.
Nothing to configure
Capability, quality, cost, latency, and health are all weighed for you to serve the best available model on every request. There are no rules to write, no routing to tune, and no picker to maintain.
Failure handling is built in
Rate limits, transient provider errors, and unhealthy serving paths can be handled inside the endpoint so your app does not need bespoke retry trees for every model family.
Get a key and send your first request.
Open the portal