2024.09.17 || #blog #ai

My local LLM setup - v1

First attempt at setting up a local LLM 🤖

👋

In my last post I mentioned I was experimenting with running AIs locally and I now have a working setup!

I'm still using Ollama to run the actual models (the tool is pretty nice to use). But now I'm using their official docker image instead of building my own.

docker pull ollama/ollama

But the real game-changer was deploying an instance of open-webui. This is an incredible open-source project that integrates with ollama and wraps a familiar and friendly UI around it.

demo gif from open-webui repository

demo from open-webui github repository

So how did I run this?

Since I use Portainer to manage all the docker containers I run on my home server, I added a simple stack to run both together:

version: "2"
services:
  ui:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    depends_on:
    - ollama
    volumes:
    - /data/apps/ai/ui/:/app/backend/data
    environment:
    - OLLAMA_BASE_URL=http://ollama:11434
    ports:
    - 11300:8080
    labels:
    - traefik.enable=true
    - traefik.http.services.ollama-ui.loadbalancer.server.port=8080
    - traefik.docker.network=traefik_public
    - traefik.http.routers.ollama-ui-web.rule=Host(`ai.domain.tld`)
    - traefik.http.routers.ollama-ui-web.entrypoints=web
    - traefik.http.routers.ollama-ui-web.middlewares=ollama-ui-redirect-web-secure
    - traefik.http.routers.ollama-ui.tls=true
    - traefik.http.routers.ollama-ui.rule=Host(`ai.domain.tld`)
    - traefik.http.routers.ollama-ui.entrypoints=websecure
    - traefik.http.middlewares.ollama-ui-redirect-web-secure.redirectscheme.scheme=https
    networks:
    - default
    - traefik

  ollama:
    image: ollama/ollama
    restart: unless-stopped
    volumes:
    - /data/apps/ai/ollama/:/root/.ollama
    ports:
    - 11434:11434
    environment:
    - OLLAMA_HOST:"0.0.0.0:11434"

networks:
  traefik:
    external:
      name: "traefik_public"

(I use traefik as a reverse proxy for my containers, so the labels and network configuration are related to that)

From there I can connect to the open-webui container and create an admin account. It can access the backing ollama to actually answer prompts 🙌

Like I said before, I'm not too worried about performance - I don't have the machine power for that. But it actually performs a lot better than I expected.

I'm running this on a Minisforum NAB7 with an intel i7 12700H and 32GB of RAM. Since it doesn't have a dedicated GPU, everything is running on CPU only.

But still, once the models are loaded into memory, the responses are pretty quick. It probably isn't enough for multiple users, and it definitely isn't enough for the larger models, but it's definitely usable.

For reference the models I've been using are llama3.1:8b and gemma2:9b.

It's a cool little project to experiment different things with AI and LLMs. Not quite sure where I'm going to take it from here, maybe I'll share that in a future post!

See you tomorrow-ish 👋