drop-in anthropic gatewaydrop-in anthropic gateway

Anthropic's API.
Without the bill.

ethereal speaks the same wire format as api.anthropic.com — same headers, same SSE, same SDKs. Switch ANTHROPIC_BASE_URL and your existing Claude Code, Cursor, or Cline keeps working. Sonnet 4.5, Opus 4.7, prompt caching, batches and tool use included.

API Anthropic.
Без чека.

ethereal говорит ровно тем же wire-форматом, что и api.anthropic.com — те же заголовки, тот же SSE, те же SDK. Поменяй ANTHROPIC_BASE_URL — и твой Claude Code, Cursor или Cline продолжат работать как ни в чём не бывало. Sonnet 4.5, Opus 4.7, prompt caching, batches и tool use из коробки.

Get a keyПолучить ключ → Read the docsДокументация Calculate costПосчитать цену

● operationalработает 15+ integrationsинтеграций 2 wire formatswire-формата ~80ms p50 overheadp50 оверхед

~/dev/agent — bash

# point any Anthropic SDK at ethereal $ export ANTHROPIC_BASE_URL="https://api.ethereal.llc" $ export ANTHROPIC_API_KEY="sk-ant-api03-..." $ claude "refactor users.go" → connecting to api.ethereal.llc ✓ → claude-sonnet-4-5 → 247 in / 1,341 out / cached: 4,012 $ curl https://api.ethereal.llc/v1/messages \ -H "x-api-key: $KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{"model":"claude-sonnet-4-5","max_tokens":256, "messages":[{"role":"user","content":"hi"}]}' ← 200 OK · stream · 1.2s

why etherealпочему ethereal

Drop in. Don't rewrite.

Every endpoint, every header, every SSE event in the Anthropic protocol — verbatim. Plus the OpenAI Chat Completions shape on /v1/chat/completions for tools that don't speak Anthropic natively.

Подставь. Не переписывай.

Каждый эндпойнт, каждый заголовок, каждое SSE-событие протокола Anthropic — байт в байт. Плюс форма OpenAI Chat Completions на /v1/chat/completions для клиентов, которые не говорят по-Anthropic.

[01] compatibilityсовместимость

Same wire format

Identical request/response shape to Anthropic. Your existing SDK, retries, and error-handling code keep working unchanged.

Тот же wire-формат

Идентичная форма запросов и ответов. Твой SDK, ретраи и обработка ошибок работают без изменений.

[02] both protocolsоба протокола

Anthropic + OpenAI

/v1/messages for Claude SDKs. /v1/chat/completions for Codex CLI, LiteLLM, Cursor, Zed and friends.

Anthropic + OpenAI

/v1/messages для Claude-SDK. /v1/chat/completions для Codex CLI, LiteLLM, Cursor, Zed и компании.

[03] streamingстриминг

Real SSE, real chunks

Server-sent events with message_start, content_block_delta, tool_use deltas — byte-for-byte the upstream format.

Настоящий SSE, настоящие чанки

Server-sent events с message_start, content_block_delta, tool_use дельтами — байт в байт upstream-формат.

[04] cachingкеширование

Prompt cache, free

Set cache_control on a message and it's cached transparently. Cache reads bill at a fraction of input tokens.

Prompt cache, бесплатно

Поставь cache_control на сообщение — оно кешируется прозрачно. Чтения из кеша тарифицируются как доля от input.

[05] batchesbatches

Async batches

Submit hundreds of requests, poll for results. Same shape as the Anthropic Batches API. Cheap throughput for offline jobs.

Async-batches

Отправь сотни запросов, опрашивай результаты. Та же форма, что Anthropic Batches API. Дешёвый throughput для офлайн-задач.

[06] vision & toolsvision и tools

Multimodal in

Pass image blocks (base64 or URL), get tool_use blocks back. Native function-calling round-trips work without translation layers.

Мультимодальность

Передавай image-блоки (base64 или URL), получай tool_use блоки. Function-calling round-trips работают без слоёв перевода.

[07] keysключи

Per-key limits

Mint sk-ant-api03-... keys with token budgets and per-minute caps. Revoke individually. Mass-mint up to 500 in one call.

Лимиты на ключ

Создавай ключи sk-ant-api03-... с бюджетом по токенам и rpm-кепами. Отзывай по одному. Mass-mint до 500 за вызов.

[08] observabilityнаблюдаемость

You can see it

Live dashboard with request volume, error rate, token spend per key, account-pool health. No third-party SaaS.

Всё видно

Live-дашборд: объём запросов, error rate, расход токенов по ключу, здоровье пула. Без сторонних SaaS.

[09] no lock-inбез lock-in

Switch anytime

Point ANTHROPIC_BASE_URL at us; point it back. Nothing in your code changes — zero friction either way.

Переключайся в любой момент

Поменяй ANTHROPIC_BASE_URL на нас; поменяй обратно. В коде ничего не меняется — нулевое трение в обе стороны.

modelsмодели

The Claude family — every tier.

Pass any of these as model. Versioned ids resolve directly; aliases like claude-3-5-sonnet-latest route to the most recent compatible release.

Семья Claude — каждый тир.

Передавай любую как model. Versioned id резолвится напрямую; алиасы вроде claude-3-5-sonnet-latest ведут на самый свежий совместимый релиз.

ModelМодель	ContextКонтекст	Max outputМакс. output	Best forПодходит для	StatusСтатус
claude-opus-4-7 opus-4	200K	32K	hardest reasoning, long horizonsсложный reasoning, длинные горизонты	live
claude-sonnet-4-5 sonnet-4	200K	64K	everyday agent / coding defaultежедневный агент / coding по умолчанию	live
claude-3-7-sonnet-latest	200K	8K	3.7-series, tool use3.7-серия, tool use	live
claude-3-5-sonnet-latest	200K	8K	cheaper bulk inferenceдешевле для bulk-инференса	live
claude-3-5-haiku-latest	200K	8K	low-latency / classifiersнизкая латентность / классификаторы	live

→ call GET /v1/models for the live, machine-readable list. → вызови GET /v1/models для live machine-readable списка.

token calculatorкалькулятор токенов

Estimate before you spend.

Paste a prompt — see token counts and the per-call cost across the Claude family. Estimation is heuristic (Anthropic's exact tokenizer is server-side); typical accuracy is ±5–10% for English, ±15% for mixed scripts.

Оцени до того, как потратишь.

Вставь промпт — увидишь токены и цену вызова по всем моделям Claude. Оценка эвристическая (точный токенайзер Anthropic — на стороне сервера); типичная точность ±5–10% для английского, ±15% для смешанных алфавитов.

Prompt textТекст промпта

          
            expected output:ожидаемый output:
            
            tokensтокенов
          
            assume input is cached (read)input кеширован (cache_read)

input tokensinput-токены

characters · wordsсимволы · слова

            modelмодель
            input$
            + output$
            = total= итого
          

$ priced per 1M tokens · in / out / cache_read · opus 15 / 75 / 1.50 · sonnet 3 / 15 / 0.30 · haiku 0.80 / 4 / 0.08 $ за 1М токенов · in / out / cache_read · opus 15 / 75 / 1.50 · sonnet 3 / 15 / 0.30 · haiku 0.80 / 4 / 0.08

plansтарифы

Pick a tier. Or just buy tokens.

Tiers mirror Anthropic's naming so you know exactly what you're getting. Need raw pay-as-you-go? Mint a key with a token budget and forget about plans.

Выбери тир. Или просто купи токены.

Тиры повторяют названия Anthropic — чтобы было сразу понятно, что покупаешь. Нужен чистый pay-as-you-go? Создай ключ с бюджетом по токенам и забудь о тарифах.

Pro

$20/mo/мес

Daily-driver agent budget. Same usage envelope as Claude Pro on the consumer side.

Бюджет на каждый день. Тот же envelope, что у Claude Pro на consumer-стороне.

~225 Sonnet messages / 5 hrs
opus + sonnet + haiku
prompt caching included
tool use, vision, batches
1 active key

~225 сообщений Sonnet / 5 часов
opus + sonnet + haiku
prompt caching включён
tool use, vision, batches
1 активный ключ

Start with Pro →Начать с Pro →

most pickedчаще берут

Max 5×

$100/mo/мес

For real coding agent loops: parallel sessions, long contexts, mid-day reruns.

Для серьёзных циклов агента: параллельные сессии, длинные контексты, реран посреди дня.

~1,100 Sonnet messages / 5 hrs
everything in Pro
5 keys, separate budgets
extended max_tokens ceiling
priority routing on the pool

~1,100 сообщений Sonnet / 5 часов
всё из Pro
5 ключей, отдельные бюджеты
расширенный потолок max_tokens
приоритетный routing в пуле

Pick Max 5× →Взять Max 5× →

Max 20×

$200/mo/мес

Heaviest tier. Multiple agents, batch jobs and reasoning-heavy Opus work.

Самый тяжёлый тир. Несколько агентов, batch-задачи, тяжёлый reasoning на Opus.

~4,500 Sonnet messages / 5 hrs
everything in Max 5×
20 keys, mass-mint API
Opus quota raised
direct support channel

~4,500 сообщений Sonnet / 5 часов
всё из Max 5×
20 ключей, mass-mint API
повышенная квота на Opus
прямой канал поддержки

Go Max 20× →Взять Max 20× →

→ No plan? Mint a raw key with N million tokens in admin. Plans are quality-of-service envelopes; the underlying API is the same for everyone. → Без тарифа? Создай raw-ключ с N миллионами токенов в админке. Тарифы — это envelope для качества обслуживания; сам API одинаковый для всех.

questionsвопросы

Things people ask first.

Что спрашивают первым.

Is this really compatible with the official Anthropic SDK?

Yes — every endpoint, every header, every SSE event the official server emits, we emit too. The proxy validates request shapes against the upstream schema before forwarding, so divergence shows up as a 400 here, not a silent corruption upstream.

The OpenAI Chat Completions adapter on /v1/chat/completions handles the legacy clients (Codex CLI, Zed config, LiteLLM) so you don't have to rewrite anything to use Claude through them.

What's the latency overhead?

About ~80ms p50 on the gateway hop, mostly TLS + auth + a token-budget read. Streaming is forwarded chunk-by-chunk — first-byte time is dominated by upstream, not us.

How is this priced?

Two paths: subscribe to a tier (Pro / Max 5× / Max 20×) for a fixed monthly envelope, or mint a key with a hard token budget and pay per million tokens. Both share the same underlying API; the only difference is who bears the metering burden.

Can I cap a key at N tokens or N requests/minute?

Yes. Per-key budgets and per-minute rate limits are first-class. The admin panel mass-mints keys with arbitrary budgets in one call — useful for issuing scoped credentials to a team or letting users self-serve under your master pool.

Does prompt caching work?

Yes — set cache_control: {"type":"ephemeral"} on a content block exactly like upstream. Cache reads are billed at the upstream cache-read rate (~10% of input price). Cache lifetime mirrors upstream; we don't re-implement the storage layer, we forward the directive.

Is my data used for training?

No. We don't store request/response bodies past the request lifetime, and we don't train on traffic. Headers, model id, token counts and timestamps go to the metering store — nothing else.

What if I'm already on Claude through Anthropic?

Switching is one line. Set ANTHROPIC_BASE_URL=https://api.ethereal.llc in your shell, the SDK or your client config. Switch back any time — we don't store anything that locks you in.

Это правда совместимо с официальным Anthropic SDK?

Да — каждый эндпойнт, каждый заголовок, каждое SSE-событие, что эмитит официальный сервер, эмитим и мы. Прокси валидирует форму запросов по upstream-схеме перед форвардом — расхождение всплывает как 400 здесь, а не как тихая порча на upstream.

OpenAI Chat Completions адаптер на /v1/chat/completions обслуживает legacy-клиентов (Codex CLI, Zed config, LiteLLM), так что переписывать ничего не нужно.

Какой оверхед по латентности?

~80мс p50 на gateway-хопе, в основном TLS + auth + чтение бюджета токенов. Streaming форвардится чанк за чанком — first-byte time определяется upstream'ом, не нами.

Как это оплачивается?

Два пути: подписка на тариф (Pro / Max 5× / Max 20×) с фиксированным месячным envelope, либо ключ с жёстким бюджетом по токенам и плата за миллион токенов. API одинаковый; разница только в том, кто несёт бремя metering'а.

Можно ограничить ключ N токенами или N запросами в минуту?

Да. Бюджеты на ключ и rpm-лимиты — first-class. Админка mass-mint'ит ключи с произвольными бюджетами в один вызов — удобно для scoped-креденшелов команде или self-serve под твоим master-пулом.

Prompt caching работает?

Да — поставь cache_control: {"type":"ephemeral"} на content-блок ровно как в upstream. Чтения из кеша тарифицируются по upstream cache-read ставке (~10% от input). Lifetime кеша зеркалирует upstream; мы не переделываем storage-слой, мы пробрасываем директиву.

Мои данные идут на обучение?

Нет. Тела запросов и ответов не хранятся за пределами жизни запроса, на трафике мы не учим. В metering уходят заголовки, model id, счётчики токенов и таймстемпы — больше ничего.

А если я уже на Claude через Anthropic?

Переключение — одна строка. Поставь ANTHROPIC_BASE_URL=https://api.ethereal.llc в шелле, SDK или конфиге клиента. Переключайся обратно в любой момент — мы не храним ничего, что бы тебя залочило.

Anthropic's API.Without the bill.

API Anthropic.Без чека.

why etherealпочему ethereal

Drop in. Don't rewrite.

Подставь. Не переписывай.

Same wire format

Тот же wire-формат

Anthropic + OpenAI

Anthropic + OpenAI

Real SSE, real chunks

Настоящий SSE, настоящие чанки

Prompt cache, free

Prompt cache, бесплатно

Async batches

Async-batches

Multimodal in

Мультимодальность

Per-key limits

Лимиты на ключ

You can see it

Всё видно

Switch anytime

Переключайся в любой момент

modelsмодели

The Claude family — every tier.

Семья Claude — каждый тир.

integrationsинтеграции

Already works with everything you use.

Уже работает со всем, чем ты пользуешься.

token calculatorкалькулятор токенов

Estimate before you spend.

Оцени до того, как потратишь.

quickstart

First call in 30 seconds.

Первый вызов за 30 секунд.

plansтарифы

Pick a tier. Or just buy tokens.

Выбери тир. Или просто купи токены.

Pro

Max 5×

Max 20×

questionsвопросы

Things people ask first.

Что спрашивают первым.

Anthropic's API.
Without the bill.

API Anthropic.
Без чека.