Skip to main content
AINode AINode is a free, open-source platform that turns any NVIDIA GPU machine into a complete local AI stack — browser chat UI, OpenAI-compatible API, LoRA fine-tuning, and automatic multi-node clustering over NCCL. One command to install. No cloud. No monthly bill.
curl -fsSL https://ainode.dev/install | bash

Key capabilities

Chat UI

Browser-based streaming chat on port 3000. Works with any loaded model.

OpenAI-compatible API

Drop-in /v1 endpoints on port 8000. Works with LangChain, Open WebUI, and any OpenAI client.

Model downloads

50+ models from the built-in catalog. Paste any Hugging Face repo ID.

Fine-tuning

LoRA, QLoRA, and full fine-tune from the browser. No notebooks.

Quantization

Compress any model to AWQ or NVFP4 on your own GPU, then serve it or push it to Hugging Face.

Multi-node clustering

Automatic peer discovery. One model sharded across all GPUs in the cluster.

Federated serving

One endpoint, many models — the master routes each request to the node that holds the model.

Model stacking

Run several models per node. They persist across restarts and replay on boot.

Prometheus metrics

/metrics endpoint for Grafana, Prometheus, VictoriaMetrics.

Verified hardware

GB10 Grace Blackwell systems (unified memory, cluster-native)

HardwareManufacturerGPU memoryPriceStatus
DGX SparkNVIDIA128 GB unified$3,999✅ Verified — TP=4, 487 GB cluster
Ascent GX10ASUS128 GB unified$2,999✅ Verified
Pro Max with GB10 (FCM1253)Dell128 GB unifiedTBD✅ Supported
ZGX Nano AI StationHP128 GB unifiedTBD✅ Supported
All GB10 systems share the same core: NVIDIA Blackwell GPU + Arm Grace CPU on NVLink-C2C, 1 petaFLOP FP4, dual-port ConnectX-7 fabric. Connect two units with the NVLink Bridge for TP=2 (244 GB VRAM, ~$6–8K total).

Data center / AI accelerators

GPUVRAMArchitectureTier
B200192 GB HBM3eBlackwellData center
H200141 GB HBM3eHopperData center
H100 SXM5 / PCIe80 GB HBM3HopperData center
A100 80 GB80 GB HBM2eAmpereData center
A100 40 GB40 GB HBM2eAmpereData center
L40S48 GB GDDR6 ECCAda LovelaceInference / viz
L4048 GB GDDR6 ECCAda LovelaceInference / viz
A4048 GB GDDR6 ECCAmpereData center / viz
L424 GB GDDR6 ECCAda LovelaceEdge inference (72W)
A3024 GB HBM2AmpereData center
A1024 GB GDDR6 ECCAmpereInference
A164× 16 GB GDDR6AmpereVDI
A216 GB GDDR6AmpereEdge

Professional workstation

GPUVRAMArchitectureTier
RTX PRO 6000 Blackwell96 GB GDDR7 ECCBlackwellPro workstation
RTX 6000 Ada48 GB GDDR6 ECCAda LovelacePro workstation
RTX 5000 Ada32 GB GDDR6 ECCAda LovelacePro workstation
RTX 4500 Ada24 GB GDDR6 ECCAda LovelacePro workstation
RTX 4000 Ada20 GB GDDR6 ECCAda LovelacePro workstation
RTX A600048 GB GDDR6 ECCAmperePro workstation
RTX A500024 GB GDDR6 ECCAmperePro workstation
RTX A400016 GB GDDR6 ECCAmperePro workstation
RTX A2000 12 GB12 GB GDDR6 ECCAmperePro entry

Consumer — GeForce RTX 50 series (Blackwell, 2025)

GPUVRAMMSRP
RTX 509032 GB GDDR7$1,999
RTX 508016 GB GDDR7$999
RTX 5070 Ti16 GB GDDR7$749
RTX 507012 GB GDDR7$549
RTX 5060 Ti8 / 16 GB GDDR7~379379–499

Consumer — GeForce RTX 40 series (Ada Lovelace, 2022–2024)

GPUVRAMMSRP
RTX 409024 GB GDDR6X$1,599
RTX 4080 Super16 GB GDDR6X$999
RTX 4070 Ti Super16 GB GDDR6X$799
RTX 4070 Super / 407012 GB GDDR6X$599
RTX 4060 Ti 16 GB16 GB GDDR6$499
RTX 4060 Ti / 40608 GB GDDR6299299–399

Consumer — GeForce RTX 30 series (Ampere, 2020–2022)

GPUVRAM
RTX 3090 Ti / 309024 GB GDDR6X
RTX 3080 Ti12 GB GDDR6X
RTX 3080 12 GB12 GB GDDR6X
RTX 3070 Ti / 30708 GB
RTX 306012 GB GDDR6
RTX 3060 Ti8 GB GDDR6
Minimum for inference: 8 GB VRAM (runs Qwen 1.5B–3B).
Recommended for 7B–13B models: 16–24 GB VRAM (RTX 3090 / 4090 / A5000).
For 70B+ models: 48–80+ GB or multi-node cluster (GB10, A100, H100, L40S, A40).
vLLM works best on Ampere (sm_80) or newer. Turing (RTX 20-series) is supported with limitations.

Live cluster demo

AINode 4-node cluster — live topology with pulsing connections Four GB10 nodes (3× DGX Spark + 1× ASUS GX10), 487 GB aggregated VRAM, automatic UDP discovery, live pulsing connections.

Architecture

AINode ships as a single unified container image. Every node in the cluster runs the same image.
ghcr.io/getainode/ainode:latest   ← pulled by the installer

         ├── aiohttp web server (chat UI + API proxy, port 3000 / 8000)
         ├── federated master router (routes /v1/* by model name)
         ├── vLLM inference engine (one or more instances per node)
         ├── UDP discovery broadcaster (port 5679)
         └── training pipeline (LoRA / QLoRA / Full + quantization: AWQ / NVFP4)
No host Python venv. No source builds. Upgrade is ainode update.

License

Apache 2.0. Free forever. Powered by argentos.ai.