thinkcentre-setup/ollama/models.md

# Ollama Models — ThinkCentre 1

CPU-only inference (Intel UHD 730, no dedicated GPU). 16 GB RAM.

## Installed

| Model | Size | RAM | Speed (tok/s) | Best for |
|-------|------|-----|---------------|----------|
| `qwen2.5:7b` | ~4.7 GB | ~6 GB | ~15-25 | Default — code, German, reasoning |

## Recommended Candidates

| Model | Size | RAM needed | Notes |
|-------|------|-----------|-------|
| `qwen2.5:7b` ✅ | 4.7 GB | 6 GB | Best quality/speed ratio on CPU |
| `mistral:7b` | 4.1 GB | 5 GB | Strong English reasoning |
| `llama3.2:3b` | 2.0 GB | 3 GB | Fastest, lower quality |
| `qwen2.5:14b` | 9.0 GB | 11 GB | Better quality, slower (~8 tok/s) |
| `deepseek-r1:7b` | 4.7 GB | 6 GB | Strong at reasoning/math |
| `nomic-embed-text` | 0.3 GB | 1 GB | Embeddings (QMD alternative) |

## API

```bash
# Chat
curl http://192.168.0.91:11434/api/generate -d '{
  "model": "qwen2.5:7b",
  "prompt": "Your prompt here",
  "stream": false
}'

# Via OpenAI-compatible endpoint
curl http://192.168.0.91:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5:7b","messages":[{"role":"user","content":"Hello"}]}'
```

## Management

```bash
ollama list              # Installed models
ollama pull <model>      # Download model
ollama rm <model>        # Remove model
ollama run <model>       # Interactive chat
sudo systemctl status ollama
sudo systemctl restart ollama
```

## OpenClaw Integration (future)

Add to `openclaw.json` as fallback:
```json
{
  "agents": {
    "defaults": {
      "model": {
        "fallbacks": [
          "openrouter/anthropic/claude-sonnet-4-6",
          "ollama/qwen2.5:7b@http://192.168.0.91:11434",
          "google/gemini-2.5-flash"
        ]
      }
    }
  }
}
```