Model Proxy¶
model.formulacode.org is an OpenAI-compatible HTTPS endpoint that fronts the
local vLLM servers. A LiteLLM proxy multiplexes a single URL across multiple
vLLM backends — clients pick which model they want by setting the model
field in the request body, exactly like the OpenAI API.
| Hostname | Auth | Backend |
|---|---|---|
model.formulacode.org |
Bearer LITELLM_MASTER_KEY (or any virtual key minted from it) |
LiteLLM proxy → local vLLM servers |
LiteLLM runs in DB-backed mode against the local Supabase Postgres
(database litellm, separate from the datasmith schema). This unlocks the
admin UI, virtual keys, spend logs, and live config edits — everything the
master key alone can't do.
Architecture¶
flowchart LR
Client["OpenAI SDK client<br/>(Bearer master_key or virtual key)"]
CFE["model.formulacode.org"]
subgraph Host["Host machine"]
CFD["cloudflared<br/>(datasmith-model tunnel)"]
LL["LiteLLM proxy :4100"]
PG[("Postgres :54322<br/>litellm db")]
VA["vllm serve :8123<br/>model A"]
VB["vllm serve :8124<br/>model B"]
end
Client --> CFE --> CFD --> LL
LL <--> PG
LL -- "model: A" --> VA
LL -- "model: B" --> VB
vLLM bound to localhost only; LiteLLM is the single internet-facing
process. Auth is enforced by LiteLLM's master_key (or scoped virtual keys
issued from it) — vLLM has none.
For users¶
Calling the proxy¶
from openai import OpenAI
client = OpenAI(
base_url="https://model.formulacode.org/v1",
api_key="sk-fc-<master-key-or-virtual-key>",
)
resp = client.chat.completions.create(
model="gemma-4-31b", # alias from infra/litellm.config.yaml
messages=[{"role": "user", "content": "hello"}],
)
/v1/models lists the configured aliases:
Admin UI¶
Visit https://model.formulacode.org/ui/ and log in with username admin,
password = LITELLM_MASTER_KEY. The UI is convenience over the same
REST API; everything you can do there has a curl equivalent.
For operators¶
Prerequisites¶
- vLLM servers already running on
127.0.0.1:8123and127.0.0.1:8124. Bind to localhost — they have no auth and must not be exposed. - Local Supabase Postgres reachable on
127.0.0.1:54322(it already is for the datasmith pipeline; LiteLLM uses a separatelitellmdatabase in the same instance). cloudflaredinstalled and authenticated (cert.pempresent in~/.cloudflared/); see Remote Access for the install steps.uvavailable on PATH.
tokens.env¶
# Bearer master key. Generate with: openssl rand -hex 32
LITELLM_MASTER_KEY=sk-fc-<random-hex>
# DB-backed mode (UI + virtual keys + spend logs).
DATABASE_URL=postgresql://postgres:postgres@127.0.0.1:54322/litellm
LITELLM_SALT_KEY=sk-salt-<random-hex> # encrypts virtual keys at rest
STORE_MODEL_IN_DB=True
The model list — aliases, vLLM ports, upstream model ids — lives in
infra/litellm.config.yaml and is checked in. The
litellm_params.model value must be prefixed openai/ so LiteLLM speaks
the OpenAI-compatible protocol vLLM exposes; the suffix after the prefix is
forwarded to vLLM in the request body and should match what vllm serve
loaded.
One-time setup¶
# 1. Create the tunnel and copy its UUID from the output
cloudflared tunnel create datasmith-model
# 2. Patch the placeholder in ~/.cloudflared/config-model.yml
TUNNEL_ID=<paste-uuid>
sed -i "s/REPLACE_WITH_MODEL_TUNNEL_ID/$TUNNEL_ID/g" ~/.cloudflared/config-model.yml
cloudflared --config ~/.cloudflared/config-model.yml tunnel ingress validate
# 3. Route DNS. --config + --overwrite-dns are required when multiple
# tunnels coexist; without them cloudflared resolves the name against
# ~/.cloudflared/config.yml and CNAMEs to the wrong tunnel.
cloudflared --config ~/.cloudflared/config-model.yml tunnel route dns \
--overwrite-dns datasmith-model model.formulacode.org
# 4. Create the LiteLLM database in the local Supabase Postgres.
docker exec supabase_db_datasmith_new psql -U postgres -c "CREATE DATABASE litellm;"
# 5. Build the persistent venv used by `make model-tunnel` (this also runs
# `prisma generate` to fetch the query-engine binary). LiteLLM applies
# Prisma migrations into the `litellm` DB on first boot.
make model-proxy-install
Day-to-day¶
This single target:
- Loads
tokens.envinto the environment. - Launches the LiteLLM binary from
.venv-litellm/on127.0.0.1:4100, readinginfra/litellm.config.yaml. The child runs from inside the venv directory to dodge Prisma's pyproject.toml lookup, which trips over the repo's[[tool.mypy.overrides]]array-of-tables. - Polls
:4100/health/livelinessuntil LiteLLM is ready. - Starts
cloudflared tunnel run datasmith-modelin the foreground. - On Ctrl-C, kills LiteLLM and exits.
Run it under tmux/screen or as a systemd service for persistence; the
target itself is foreground only.
Adding or swapping a model¶
Edit infra/litellm.config.yaml and restart make model-tunnel.
Each entry under model_list is model_name (the alias clients pass) +
litellm_params.model (openai/<vllm-loaded-id>) + api_base
(http://127.0.0.1:<port>/v1). With STORE_MODEL_IN_DB=True you can also
add models live from the UI — they persist in LiteLLM_ProxyModelTable and
survive restarts without editing the YAML.
Issuing virtual keys¶
Mint a scoped key (different from the master key, individually revocable):
curl -sS -X POST -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
https://model.formulacode.org/key/generate \
-d '{"models":["gemma-4-31b"],"key_alias":"alice","duration":"30d","max_budget":50}'
The response includes "key": "sk-..." — that's what you hand to the user.
Revoke with POST /key/delete body {"keys":["sk-..."]}. Browse / search
keys in the UI under /ui/key-management.
Troubleshooting¶
| Symptom | Likely cause |
|---|---|
401 Unauthorized |
Bearer token does not match the master key or any active virtual key |
400 model not found |
The model field does not match any model_name in infra/litellm.config.yaml (or any DB-stored model) |
502 Bad Gateway |
LiteLLM up, vLLM backend down — check curl http://127.0.0.1:8123/v1/models |
UI shows Authentication Error: Not connected to DB! |
DATABASE_URL not set, or the litellm database doesn't exist yet — re-run step 4 of one-time setup |
Unable to find Prisma binaries. Please run 'prisma generate' first. |
The .venv-litellm/ venv is missing or stale — rm -rf .venv-litellm && make model-proxy-install |
tomlkit.exceptions.ParseError: Key "overrides" already exists during prisma generate |
Prisma is reading the repo's pyproject.toml from cwd. The Makefile already cd's into .venv-litellm to dodge this; if you're invoking prisma manually, do the same. |
litellm exited before becoming ready |
Read the LiteLLM stderr above the failure line — usually a missing env var or a Postgres connectivity issue |
Responses look like a different LiteLLM (e.g. no_db_connection from the chat path) |
Port 4100 is taken by another process; LiteLLM falls back to an ephemeral port and the tunnel/health-check ends up hitting the squatter. Check ss -tlnp \| grep :4100 and pick an unused port in Makefile, ~/.cloudflared/config-model.yml, and this doc. |
| Hangs on long generations | Increase request_timeout in infra/litellm.config.yaml |
Security notes¶
- vLLM has no auth. Bind it to
127.0.0.1, never0.0.0.0. - The proxy hostname has no Cloudflare Access gate; auth is the
LiteLLM master/virtual key only. Rotate the master by changing
LITELLM_MASTER_KEYintokens.envand restartingmake model-tunnel. Rotate a virtual key by deleting and re-issuing it — only that user is affected. LITELLM_SALT_KEYencrypts virtual key secrets at rest in Postgres. Changing it after keys are issued will make existing keys unreadable, so treat it as immutable post-bootstrap.- The
litellmdatabase is in the same Postgres instance as Supabase but is a separate logical database; the LiteLLM tables cannot reach datasmith data and vice-versa.