Base URLs
| Purpose | URL |
|---|---|
| Inference (OpenAI-compatible) | http://localhost:8000/v1 |
| Management API | http://localhost:3000/api |
| Prometheus metrics | http://localhost:8000/metrics |
Authentication
Disabled by default. Enable for internet-exposed instances:Authorization: Bearer <key> on all /v1/* requests.
OpenAI-compatible endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | Chat completions (streaming supported) |
/v1/completions | POST | Text completions |
/v1/embeddings | POST | Embeddings |
/v1/models | GET | List loaded models |
Management endpoints
| Endpoint | Description |
|---|---|
GET /api/status | Node status, GPU info, engine state |
GET /api/nodes | All discovered cluster nodes |
GET /api/cluster/resources | Aggregated VRAM across cluster |
GET /api/metrics | JSON metrics snapshot |
GET /metrics | Prometheus text exposition |
POST /api/engine/set-model | Switch active model |
POST /api/engine/update | Trigger docker pull + restart |
GET /api/version/check | Check for newer version on GHCR |
