Ollama
Add local AI models as tools for the container agent. Offload summarization, translation, and queries to Ollama.
What it does
- Exposes local Ollama models as MCP tools inside the container
- Claude stays the orchestrator — delegates specific tasks to local models
- Two tools added: ollama_list_models and ollama_generate
- No API key needed — runs against your local Ollama instance
- Optional macOS notification watcher for monitoring Ollama usage
What you'll need
- NanoClaw installed and running
- Ollama installed and running on the host
- At least one Ollama model pulled
Install
/add-ollama-tool How it works
The /add-ollama-tool skill adds a stdio-based MCP server that connects the container agent to your local Ollama instance. Claude remains the orchestrator — it decides when to use a local model and what to send it. This is useful for offloading cheaper or faster tasks like summarization, translation, or general knowledge queries to a local model while keeping Claude for complex reasoning.
The skill adds two tools to the agent:
- ollama_list_models — lists all models installed in your Ollama instance.
- ollama_generate — sends a prompt to a specified model and returns the response.
When a user sends a message like “use ollama to summarize this article,” Claude calls ollama_list_models to see what’s available, picks an appropriate model, and calls ollama_generate with the prompt. The response from the local model is incorporated into Claude’s reply.
Setup
The skill checks that Ollama is installed and running, then applies code changes through the skills engine. It adds the MCP server source file, registers it in the container configuration, and copies the updated source to any existing per-group session directories.
If you don’t have any Ollama models yet, the skill suggests starting with one of these:
ollama pull gemma3:1b— small and fast (1 GB)ollama pull llama3.2— good general purpose (2 GB)ollama pull qwen3-coder:30b— strong at code tasks (18 GB)
Network routing
The MCP server inside the container connects to Ollama on the host machine. With Docker, it reaches the host via host.docker.internal:11434. With Apple Container, it uses localhost:11434. If you’re running Ollama on a different machine or port, set OLLAMA_HOST in your .env file.
Monitoring
The skill includes an optional watcher script (scripts/ollama-watch.sh) that sends macOS notifications when Ollama is used. Run it in a terminal to see real-time notifications as the agent generates with local models.
You can also monitor via logs:
tail -f logs/nanoclaw.log | grep -i ollama
Look for [OLLAMA] >>> Generating and [OLLAMA] <<< Done entries.
Troubleshooting
Agent says “Ollama is not installed.” The agent is trying to run the ollama CLI inside the container instead of using the MCP tools. This means the MCP server wasn’t registered correctly. Re-run the skill or verify that container/agent-runner/src/index.ts has the ollama entry in its mcpServers config.
“Failed to connect to Ollama.” Verify Ollama is running on the host (ollama list). If using Docker, confirm the container can reach the host network. If using a custom host, check the OLLAMA_HOST value in .env.
Agent doesn’t use Ollama tools. The agent may not know the tools exist. Be explicit in your message: “use the ollama_generate tool with gemma3:1b to answer this.” Once the agent has used the tools once, it tends to remember them for future messages.
Tips
- Claude decides when to use Ollama based on the task. You can nudge it by mentioning Ollama in your message, or you can let Claude figure out when a local model is sufficient.
- Smaller models respond faster but produce lower-quality output. For simple tasks like translation or factual lookups, a 1-2 GB model is fine. For anything requiring reasoning, let Claude handle it directly.
- The Ollama MCP server runs as a stdio process inside the container — it starts with each agent session and stops when the session ends. There’s no persistent background process.