Self-GPT: Open WebUI + Ollama = Self Hosted ChatGPT

spiritedpause@sh.itjust.works · 1 month ago

Self-GPT: Open WebUI + Ollama = Self Hosted ChatGPT

camilobotero@feddit.dk · 1 month ago

What are your PC specifications for running Ollama3.1:70B smoothly?

The Hobbyist@lemmy.zip · 1 month ago

I wish I could. I have an RTX 3060 12GB, I run mostly llama3.1 8B versions in fp8, at 30-35 tokens/s.

camilobotero@feddit.dk · 1 month ago

I can confirm that it does not run (at least not smoothly) with an Nvidia 4080 12Gb. However, gemma2:27B runs pretty well. Do you think if we add another graphical card, a modest one, maybe the llama3.1:70B could run?

brucethemoose@lemmy.world · edit-2 1 month ago

No, but you can run Qwen 2.5 34B with 24GB total.

Host it in TabbyAPI instead of ollama too. Use its native tensor parallelism and Q4 cache, it will fly.