What it is
Ollama is a command-line tool for downloading and running open-source LLMs locally. It exposes a simple HTTP API at port 11434 and a model registry that's roughly the Hugging Face of GGUF-quantized models, with a friendlier UX.
Why I run it (still)
Ollama was the original local-LLM runtime on the Zenbook/Jarvis host before I figured out how to get the Radeon 890M iGPU involved. Its CPU-only fallback worked, but at ~5–10 tok/s on a 7B model the experience was just slow enough to be annoying. Then I bake-tested every option that claimed AMD iGPU support, and LM Studio with the Vulkan backend won — auto-detected the iGPU, no driver hacks, ~16 tok/s on the same models.
Ollama still runs because:
- It's still useful as a fallback if LM Studio is misbehaving.
- Open WebUI's model picker shows both Ollama and LM Studio side by side, so the older model collection stays browseable.
- The Home Assistant Ollama integration is still configured (just not called by any active automation). One less migration step if Ollama ever comes back into rotation.
No production workflow currently routes through Ollama. It is a legacy fallback and model-browser lane now; retirement is pending an explicit final check rather than more daily-driver time.
The Zenbook itself is now reachable over SSH for agent checks, so diagnosing "is the laptop up?" no longer requires opening a remote desktop session first.
How I use it
Indirectly, through Open WebUI. The Ollama daemon runs on the Zenbook on port 11434, Open WebUI lists its models under the "Local" provider tab, and I can swap between an Ollama model and an LM Studio model in the same chat. For anything I actually want fast answers from, I pick the LM Studio model.
Setup notes
- Host: the Zenbook (separate Windows laptop with the AMD Ryzen AI 9 HX 370 + Radeon 890M).
- Reverse proxy: no, not directly. What's proxied is Open WebUI, which Ollama serves models to.
- Why it can't use the iGPU: ROCm doesn't support the 890M (RDNA 3.5,
gfx1150isn't on Ollama's supported-GPU list). TheHSA_OVERRIDE_GFX_VERSIONtrick that works on some other AMD iGPUs is unreliable on Strix Point. Ollama silently falls back to CPU withsize_vram=0. Not Ollama's fault — it's a ROCm scope issue. - Update cadence: manual.
Runbook
- Healthy looks like:
curl localhost:11434/api/tagsfrom the Zenbook returns the installed model list; Open WebUI shows Ollama models in the picker. - Service randomly unavailable at odd hours: Windows Update reboots the Zenbook. Ollama starts on login, not at boot, so the service stays down until someone signs in. Workarounds: register Ollama as a Windows service via NSSM, or enable auto-login. I've accepted the limitation instead — Ollama is fallback-only at this point and best-effort availability is fine.
- Where logs live: Ollama's own log files on the Zenbook; Open WebUI's logs for the request side.