Chat completions
POST http://localhost:8000/v1/chat/completions
Streaming
Set"stream": true — response is server-sent events, same as OpenAI.
Embeddings
POST http://localhost:8000/v1/embeddings
List models
GET http://localhost:8000/v1/models
Returns the currently loaded model(s).
Compatible clients
- Python:
openaiSDK,langchain,litellm - Node.js:
openainpm package - Go:
sashabaranov/go-openai - Open WebUI: set base URL to
http://localhost:8000/v1 - Anything else that speaks OpenAI
