# Direct Inference

> Direct Inference is a zero-knowledge inference endpoint. Point an OpenAI-,
> Anthropic-, or Gemini-compatible client at one base URL, keep the model id your
> app already sends, and every request is classified by its shape and fulfilled on
> a capable path. The response echoes your model id back; which model, provider,
> or version served the request stays hidden. Only the request type is exposed.

Base URL: https://app.directinference.com/di/v1
Auth: send your Direct Inference API key as the SDK's API key / Bearer token.

## Quickstart

OpenAI-compatible (Python):

    from openai import OpenAI

    client = OpenAI(
        base_url="https://app.directinference.com/di/v1",
        api_key="YOUR_DIRECT_INFERENCE_KEY",
    )
    resp = client.chat.completions.create(
        model="gpt-5.5-mini",            # keep your own model id; it is echoed back
        messages=[{"role": "user", "content": "Summarize this thread."}],
    )

The same base URL also accepts the Anthropic Messages shape and the Gemini
generateContent shape. Streaming, tool use, vision, PDFs, and structured output
all pass through.

## Request types (classified from the request shape)

- vision: image content in the request -> a vision-capable model.
- document: PDF or file input -> document-capable handling.
- long: input beyond the standard context window -> a long-context path.
- code: tool definitions, diffs, stack traces, repo paths -> coding/tool strength.
- json: a response/output JSON schema is set -> a schema-reliable model.
- reason: multi-step reasoning in the prompt -> a reasoning model.
- flash: simple request at low effort -> fast and cheap.
- pro: everything else (default) -> a strong all-rounder.

Capability outranks the model name: a PDF or image sent to a "mini" id still gets
a capable model. Unknown, legacy, and future ids resolve instead of erroring.

## Effort (optional cost/quality hint)

Send the X-DI-Effort header or an ?effort= query param. Levels: minimal, low,
medium (default), high, xhigh. Effort tunes the serving choice; request shape
still decides the needed capability.

    resp = client.chat.completions.create(
        model="gpt-5.5",
        messages=[{"role": "user", "content": "Plan a database migration."}],
        extra_headers={"X-DI-Effort": "high"},
    )

## Links

- Product: https://directinference.com/
- Why Direct Inference (zero-knowledge vs. transparent routers): https://directinference.com/why
- Developers (quickstart, request types, compatibility): https://directinference.com/developers
- Pricing: https://directinference.com/pricing
- Security: https://directinference.com/security
- Portal (create an API key): https://app.directinference.com
- Full machine-readable docs: https://directinference.com/llms-full.txt