Files
thinkcentre-setup/ollama/models.md
T

1.7 KiB

Ollama Models — ThinkCentre 1

CPU-only inference (Intel UHD 730, no dedicated GPU). 16 GB RAM.

Installed

Model Size RAM Speed (tok/s) Best for
qwen2.5:7b ~4.7 GB ~6 GB ~15-25 Default — code, German, reasoning
Model Size RAM needed Notes
qwen2.5:7b 4.7 GB 6 GB Best quality/speed ratio on CPU
mistral:7b 4.1 GB 5 GB Strong English reasoning
llama3.2:3b 2.0 GB 3 GB Fastest, lower quality
qwen2.5:14b 9.0 GB 11 GB Better quality, slower (~8 tok/s)
deepseek-r1:7b 4.7 GB 6 GB Strong at reasoning/math
nomic-embed-text 0.3 GB 1 GB Embeddings (QMD alternative)

API

# Chat
curl http://192.168.0.91:11434/api/generate -d '{
  "model": "qwen2.5:7b",
  "prompt": "Your prompt here",
  "stream": false
}'

# Via OpenAI-compatible endpoint
curl http://192.168.0.91:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5:7b","messages":[{"role":"user","content":"Hello"}]}'

Management

ollama list              # Installed models
ollama pull <model>      # Download model
ollama rm <model>        # Remove model
ollama run <model>       # Interactive chat
sudo systemctl status ollama
sudo systemctl restart ollama

OpenClaw Integration (future)

Add to openclaw.json as fallback:

{
  "agents": {
    "defaults": {
      "model": {
        "fallbacks": [
          "openrouter/anthropic/claude-sonnet-4-6",
          "ollama/qwen2.5:7b@http://192.168.0.91:11434",
          "google/gemini-2.5-flash"
        ]
      }
    }
  }
}