jhinx.dev

What it is

Ollama is a command-line tool for downloading and running open-source LLMs locally. It exposes a simple HTTP API at port 11434 and a model registry that's roughly the Hugging Face of GGUF-quantized models, with a friendlier UX.

Why I run it (still)

Ollama was the original local-LLM runtime on the Zenbook/Jarvis host before I figured out how to get the Radeon 890M iGPU involved. Its CPU-only fallback worked, but at ~5–10 tok/s on a 7B model the experience was just slow enough to be annoying. Then I bake-tested every option that claimed AMD iGPU support, and LM Studio with the Vulkan backend won — auto-detected the iGPU, no driver hacks, ~16 tok/s on the same models.

Ollama still runs because:

It's still useful as a fallback if LM Studio is misbehaving.
Open WebUI's model picker shows both Ollama and LM Studio side by side, so the older model collection stays browseable.
The Home Assistant Ollama integration is still configured (just not called by any active automation). One less migration step if Ollama ever comes back into rotation.

No production workflow currently routes through Ollama. It is a legacy fallback and model-browser lane now; retirement is pending an explicit final check rather than more daily-driver time.

The Zenbook itself is now reachable over SSH for agent checks, so diagnosing "is the laptop up?" no longer requires opening a remote desktop session first.

How I use it

Indirectly, through Open WebUI. The Ollama daemon runs on the Zenbook on port 11434, Open WebUI lists its models under the "Local" provider tab, and I can swap between an Ollama model and an LM Studio model in the same chat. For anything I actually want fast answers from, I pick the LM Studio model.