Developers

Drop in one endpoint. Get inference that feels engineered.

The DI Model is a stable endpoint for OpenAI, Anthropic, and Gemini SDKs. Change the base URL, keep your existing model string, and each request is understood, classified, and fulfilled for you.

Base URLhttps://app.directinference.com/di/v1

Quickstart

Three steps to your first request

Step 1

Create a key

Sign in to the portal and issue a Direct Inference API key.

Step 2

Point at the endpoint

Set your SDK base URL to the DI Model endpoint. Nothing else in the call shape changes.

Step 3

Keep your model string

Send the model id your app already sends. Direct Inference handles capability behind the same response shape.

OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(
    base_url="https://app.directinference.com/di/v1",
    api_key="YOUR_DIRECT_INFERENCE_KEY",
)

# Keep sending whatever model string your app already sends.
resp = client.chat.completions.create(
    model="gpt-5.5-mini",
    messages=[{"role": "user", "content": "Summarize this thread."}],
)
print(resp.model)  # -> "gpt-5.5-mini" (your id, echoed back)

Anthropic SDK (Python)

from anthropic import Anthropic

client = Anthropic(
    base_url="https://app.directinference.com/di/v1",
    api_key="YOUR_DIRECT_INFERENCE_KEY",
)

# Same endpoint speaks the Anthropic Messages shape too.
msg = client.messages.create(
    model="claude-haiku",
    max_tokens=512,
    messages=[{"role": "user", "content": "Extract the action items."}],
)

Gemini SDK (Python)

from google import genai

client = genai.Client(
    api_key="YOUR_DIRECT_INFERENCE_KEY",
    http_options={"base_url": "https://app.directinference.com/di/v1"},
)

# Keep your Gemini model id — Direct Inference handles the capability.
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Summarize this thread.",
)
print(resp.text)

The effort hint

One optional knob for cost vs. quality

Effort is a hint, not homework. Request shape still decides the needed capability; effort tunes the serving choice within it and keeps cross-provider reasoning controls aligned. Send it as a header, a query param, or your SDK's native reasoning field — medium is the default.

Per-request effort

# Effort is an optional per-request hint. Medium is the default.
resp = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Plan a database migration."}],
    extra_headers={"X-DI-Effort": "high"},
)

# Or per request via query string:
#   POST https://app.directinference.com/di/v1/chat/completions?effort=minimal

Effort levels

minimal
Lowest latency, minimal thinking budget, simple work.
low
Light reasoning, concise answers.
medium
Balanced default behavior.
high
Deeper reasoning and more careful synthesis.
xhigh
Maximum reasoning budget for the hardest requests.

Request types

What gets detected, and what it triggers

Every call is classified by its shape. Capability always outranks the model name — so a document or image request gets a capable model regardless of the id or effort you send.

Request typeDetected from
visionImage content in the request
documentPDF or file input
longInput beyond the standard context window
codeTool definitions, diffs, stack traces, repo paths
jsonA response or output JSON schema is set
reasonMulti-step reasoning in the prompt
flashSimple request at low effort
proEverything else (default)

Built for coding agents

Drop Direct Inference into your agent.

Coding tools and agents are first-class clients. Point the base URL at Direct Inference and let machine-readable docs handle the rest — no plugin, no adapter, no bespoke client.

Any OpenAI-compatible tool

# Point any OpenAI-compatible coding tool at Direct Inference.
# (Same idea for Anthropic- or Gemini-compatible tools.)
export OPENAI_BASE_URL="https://app.directinference.com/di/v1"
export OPENAI_API_KEY="YOUR_DIRECT_INFERENCE_KEY"

# Machine-readable for agents:
#   https://directinference.com/llms.txt        (concise)
#   https://directinference.com/llms-full.txt   (full)

Machine-readable docs

A concise /llms.txt and a full /llms-full.txt let a coding agent read the whole integration — base URL, SDK shapes, request types, effort — and wire it up itself.

Point the base URL and go

Any OpenAI-, Anthropic-, or Gemini-compatible coding tool works by changing one base URL. No plugin, no adapter, no bespoke client.

One key across your stack

The same Direct Inference key powers your editor, your agents, and your production app — with usage and caps visible across all of them.

Machine-readable docs:/llms.txt/llms-full.txt

Compatibility

Guarantees you can build on

One line, one key, no rewrite

Point your existing client at one base URL and set your key. Your SDK, your calls, and your logging keep working as-is — there's nothing to re-architect.

No more deprecation fire drills

When a model is renamed or retired upstream, nothing in your code breaks and there's no migration to run — the endpoint keeps serving the same use cases.

Three SDK shapes, one endpoint

Point an OpenAI-, Anthropic-, or Gemini-compatible client at the same base URL — streaming, tool use, and structured output all pass through.

Capability outranks the name

A PDF sent to a “mini” model still gets a document-capable model. The request decides, not the string.

Nothing to configure

Capability, quality, cost, latency, and health are all weighed for you to serve the best available model on every request. There are no rules to write, no routing to tune, and no picker to maintain.

Failure handling is built in

Rate limits, transient provider errors, and unhealthy serving paths can be handled inside the endpoint so your app does not need bespoke retry trees for every model family.

Get a key and send your first request.

Open the portal