Skip to main content

Endpoints

URLFormatDescription
GET http://localhost:8000/metricsPrometheus textScrape target for Prometheus / Grafana
GET http://localhost:3000/api/metricsJSONFull snapshot
GET http://localhost:3000/api/metrics/gpuJSONGPU stats only
GET http://localhost:3000/api/metrics/requestsJSONRequest stats with latency percentiles

Prometheus scrape config

scrape_configs:
  - job_name: ainode
    static_configs:
      - targets: ["<node-ip>:8000"]

Key metrics

MetricTypeDescription
ainode_uptime_secondscounterSeconds since process start
ainode_requests_totalcounterTotal inference requests
ainode_request_errors_totalcounterFailed requests
ainode_tokens_generated_totalcounterTotal tokens generated
ainode_tokens_per_secondgaugeAverage throughput
ainode_request_latency_milliseconds{quantile}summaryP50/P95/P99 latency
ainode_requests_by_model_total{model}counterPer-model request counts
ainode_gpu_utilization_percentgaugeGPU utilization 0–100
ainode_gpu_memory_used_bytesgaugeGPU memory in use
ainode_gpu_memory_total_bytesgaugeTotal GPU memory
ainode_gpu_temperature_celsiusgaugeGPU temperature
ainode_build_info{version}gaugeAlways 1; carries version label
# GPU memory > 95%
- alert: AINodeGPUMemoryHigh
  expr: ainode_gpu_memory_used_bytes / ainode_gpu_memory_total_bytes > 0.95

# Error rate > 5/min
- alert: AINodeErrorRate
  expr: rate(ainode_request_errors_total[1m]) > 5

# Temperature > 85°C
- alert: AINodeGPUTemp
  expr: ainode_gpu_temperature_celsius > 85

# Node down
- alert: AINodeDown
  expr: up{job="ainode"} == 0