67 lines
1.7 KiB
Markdown
67 lines
1.7 KiB
Markdown
# Ollama Models — ThinkCentre 1
|
|
|
|
CPU-only inference (Intel UHD 730, no dedicated GPU). 16 GB RAM.
|
|
|
|
## Installed
|
|
|
|
| Model | Size | RAM | Speed (tok/s) | Best for |
|
|
|-------|------|-----|---------------|----------|
|
|
| `qwen2.5:7b` | ~4.7 GB | ~6 GB | ~15-25 | Default — code, German, reasoning |
|
|
|
|
## Recommended Candidates
|
|
|
|
| Model | Size | RAM needed | Notes |
|
|
|-------|------|-----------|-------|
|
|
| `qwen2.5:7b` ✅ | 4.7 GB | 6 GB | Best quality/speed ratio on CPU |
|
|
| `mistral:7b` | 4.1 GB | 5 GB | Strong English reasoning |
|
|
| `llama3.2:3b` | 2.0 GB | 3 GB | Fastest, lower quality |
|
|
| `qwen2.5:14b` | 9.0 GB | 11 GB | Better quality, slower (~8 tok/s) |
|
|
| `deepseek-r1:7b` | 4.7 GB | 6 GB | Strong at reasoning/math |
|
|
| `nomic-embed-text` | 0.3 GB | 1 GB | Embeddings (QMD alternative) |
|
|
|
|
## API
|
|
|
|
```bash
|
|
# Chat
|
|
curl http://192.168.0.91:11434/api/generate -d '{
|
|
"model": "qwen2.5:7b",
|
|
"prompt": "Your prompt here",
|
|
"stream": false
|
|
}'
|
|
|
|
# Via OpenAI-compatible endpoint
|
|
curl http://192.168.0.91:11434/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model":"qwen2.5:7b","messages":[{"role":"user","content":"Hello"}]}'
|
|
```
|
|
|
|
## Management
|
|
|
|
```bash
|
|
ollama list # Installed models
|
|
ollama pull <model> # Download model
|
|
ollama rm <model> # Remove model
|
|
ollama run <model> # Interactive chat
|
|
sudo systemctl status ollama
|
|
sudo systemctl restart ollama
|
|
```
|
|
|
|
## OpenClaw Integration (future)
|
|
|
|
Add to `openclaw.json` as fallback:
|
|
```json
|
|
{
|
|
"agents": {
|
|
"defaults": {
|
|
"model": {
|
|
"fallbacks": [
|
|
"openrouter/anthropic/claude-sonnet-4-6",
|
|
"ollama/qwen2.5:7b@http://192.168.0.91:11434",
|
|
"google/gemini-2.5-flash"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|