Model Proxy¶

model.formulacode.org is an OpenAI-compatible HTTPS endpoint that fronts the local vLLM servers. A LiteLLM proxy multiplexes a single URL across multiple vLLM backends — clients pick which model they want by setting the model field in the request body, exactly like the OpenAI API.

Hostname	Auth	Backend
`model.formulacode.org`	Bearer `LITELLM_MASTER_KEY` (or any virtual key minted from it)	LiteLLM proxy → local vLLM servers

LiteLLM runs in DB-backed mode against the local Supabase Postgres (database litellm, separate from the datasmith schema). This unlocks the admin UI, virtual keys, spend logs, and live config edits — everything the master key alone can't do.

Architecture¶

flowchart LR
    Client["OpenAI SDK client<br/>(Bearer master_key or virtual key)"]
    CFE["model.formulacode.org"]
    subgraph Host["Host machine"]
        CFD["cloudflared<br/>(datasmith-model tunnel)"]
        LL["LiteLLM proxy :4100"]
        PG[("Postgres :54322<br/>litellm db")]
        VA["vllm serve :8123<br/>model A"]
        VB["vllm serve :8124<br/>model B"]
    end
    Client --> CFE --> CFD --> LL
    LL <--> PG
    LL -- "model: A" --> VA
    LL -- "model: B" --> VB

vLLM bound to localhost only; LiteLLM is the single internet-facing process. Auth is enforced by LiteLLM's master_key (or scoped virtual keys issued from it) — vLLM has none.

For users¶

Calling the proxy¶

from openai import OpenAI

client = OpenAI(
    base_url="https://model.formulacode.org/v1",
    api_key="sk-fc-<master-key-or-virtual-key>",
)

resp = client.chat.completions.create(
    model="gemma-4-31b",                   # alias from infra/litellm.config.yaml
    messages=[{"role": "user", "content": "hello"}],
)

/v1/models lists the configured aliases:

curl -H "Authorization: Bearer sk-fc-..." \
    https://model.formulacode.org/v1/models

Admin UI¶

Visit https://model.formulacode.org/ui/ and log in with username admin, password = LITELLM_MASTER_KEY. The UI is convenience over the same REST API; everything you can do there has a curl equivalent.

For operators¶

Prerequisites¶

vLLM servers already running on 127.0.0.1:8123 and 127.0.0.1:8124. Bind to localhost — they have no auth and must not be exposed.
Local Supabase Postgres reachable on 127.0.0.1:54322 (it already is for the datasmith pipeline; LiteLLM uses a separate litellm database in the same instance).
cloudflared installed and authenticated (cert.pem present in ~/.cloudflared/); see Remote Access for the install steps.
uv available on PATH.

`tokens.env`¶

# Bearer master key. Generate with: openssl rand -hex 32
LITELLM_MASTER_KEY=sk-fc-<random-hex>

# DB-backed mode (UI + virtual keys + spend logs).
DATABASE_URL=postgresql://postgres:postgres@127.0.0.1:54322/litellm
LITELLM_SALT_KEY=sk-salt-<random-hex>          # encrypts virtual keys at rest
STORE_MODEL_IN_DB=True

The model list — aliases, vLLM ports, upstream model ids — lives in infra/litellm.config.yaml and is checked in. The litellm_params.model value must be prefixed openai/ so LiteLLM speaks the OpenAI-compatible protocol vLLM exposes; the suffix after the prefix is forwarded to vLLM in the request body and should match what vllm serve loaded.

One-time setup¶

# 1. Create the tunnel and copy its UUID from the output
cloudflared tunnel create datasmith-model

# 2. Patch the placeholder in ~/.cloudflared/config-model.yml
TUNNEL_ID=<paste-uuid>
sed -i "s/REPLACE_WITH_MODEL_TUNNEL_ID/$TUNNEL_ID/g" ~/.cloudflared/config-model.yml
cloudflared --config ~/.cloudflared/config-model.yml tunnel ingress validate

# 3. Route DNS. --config + --overwrite-dns are required when multiple
#    tunnels coexist; without them cloudflared resolves the name against
#    ~/.cloudflared/config.yml and CNAMEs to the wrong tunnel.
cloudflared --config ~/.cloudflared/config-model.yml tunnel route dns \
    --overwrite-dns datasmith-model model.formulacode.org

# 4. Create the LiteLLM database in the local Supabase Postgres.
docker exec supabase_db_datasmith_new psql -U postgres -c "CREATE DATABASE litellm;"

# 5. Build the persistent venv used by `make model-tunnel` (this also runs
#    `prisma generate` to fetch the query-engine binary). LiteLLM applies
#    Prisma migrations into the `litellm` DB on first boot.
make model-proxy-install

Day-to-day¶

make model-tunnel

This single target:

Loads tokens.env into the environment.
Launches the LiteLLM binary from .venv-litellm/ on 127.0.0.1:4100, reading infra/litellm.config.yaml. The child runs from inside the venv directory to dodge Prisma's pyproject.toml lookup, which trips over the repo's [[tool.mypy.overrides]] array-of-tables.
Polls :4100/health/liveliness until LiteLLM is ready.
Starts cloudflared tunnel run datasmith-model in the foreground.
On Ctrl-C, kills LiteLLM and exits.

Run it under tmux/screen or as a systemd service for persistence; the target itself is foreground only.

Adding or swapping a model¶

Edit infra/litellm.config.yaml and restart make model-tunnel. Each entry under model_list is model_name (the alias clients pass) + litellm_params.model (openai/<vllm-loaded-id>) + api_base (http://127.0.0.1:<port>/v1). With STORE_MODEL_IN_DB=True you can also add models live from the UI — they persist in LiteLLM_ProxyModelTable and survive restarts without editing the YAML.

Issuing virtual keys¶

Mint a scoped key (different from the master key, individually revocable):

curl -sS -X POST -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
    -H "Content-Type: application/json" \
    https://model.formulacode.org/key/generate \
    -d '{"models":["gemma-4-31b"],"key_alias":"alice","duration":"30d","max_budget":50}'

The response includes "key": "sk-..." — that's what you hand to the user. Revoke with POST /key/delete body {"keys":["sk-..."]}. Browse / search keys in the UI under /ui/key-management.

Troubleshooting¶

Symptom	Likely cause
`401 Unauthorized`	Bearer token does not match the master key or any active virtual key
`400 model not found`	The `model` field does not match any `model_name` in `infra/litellm.config.yaml` (or any DB-stored model)
`502 Bad Gateway`	LiteLLM up, vLLM backend down — check `curl http://127.0.0.1:8123/v1/models`
UI shows `Authentication Error: Not connected to DB!`	`DATABASE_URL` not set, or the `litellm` database doesn't exist yet — re-run step 4 of one-time setup
`Unable to find Prisma binaries. Please run 'prisma generate' first.`	The `.venv-litellm/` venv is missing or stale — `rm -rf .venv-litellm && make model-proxy-install`
`tomlkit.exceptions.ParseError: Key "overrides" already exists` during prisma generate	Prisma is reading the repo's `pyproject.toml` from cwd. The Makefile already cd's into `.venv-litellm` to dodge this; if you're invoking prisma manually, do the same.
`litellm exited before becoming ready`	Read the LiteLLM stderr above the failure line — usually a missing env var or a Postgres connectivity issue
Responses look like a different LiteLLM (e.g. `no_db_connection` from the chat path)	Port 4100 is taken by another process; LiteLLM falls back to an ephemeral port and the tunnel/health-check ends up hitting the squatter. Check `ss -tlnp \\| grep :4100` and pick an unused port in `Makefile`, `~/.cloudflared/config-model.yml`, and this doc.
Hangs on long generations	Increase `request_timeout` in `infra/litellm.config.yaml`

Security notes¶

vLLM has no auth. Bind it to 127.0.0.1, never 0.0.0.0.
The proxy hostname has no Cloudflare Access gate; auth is the LiteLLM master/virtual key only. Rotate the master by changing LITELLM_MASTER_KEY in tokens.env and restarting make model-tunnel. Rotate a virtual key by deleting and re-issuing it — only that user is affected.
LITELLM_SALT_KEY encrypts virtual key secrets at rest in Postgres. Changing it after keys are issued will make existing keys unreadable, so treat it as immutable post-bootstrap.
The litellm database is in the same Postgres instance as Supabase but is a separate logical database; the LiteLLM tables cannot reach datasmith data and vice-versa.