← all posts

My Self-Hosted AI Stack Runs on a 15A Circuit

May 2026

~$106/mo, zero API costs, all inference local.

Two years ago I sat down to figure out what it would actually cost to run a useful AI stack at home. Not a toy. Not a single model on a laptop. A real stack — agents, knowledge bases, automation, a public-facing website — all running locally, on my own equipment.

The answer surprised me: about $106 a month. That's internet, electricity, hardware amortization, and software. For context, that's less than a single seat of Copilot or a mid-tier ChatGPT subscription when you factor in what you actually use.

---

Why Build?

So why build this yourself when the current market price is ~$20/month or $240/year for most plans? There is no shortage of options if you simply want to buy into an existing model with its ecosystem and attendant access plan.

If you are looking to setup to vibe code at home, you may want to take a pass through the online options before you undertake rolling your own. I did a canvas of the usual suspects from Cursor to Antigravity to Visual Studio with Copilot and a plethora of plugins. You can buy in and get started for about 20 bucks a month.

If you have started this read you likely already have your own answer. But for me, running out of budgets two thirds of the way through any given project combined with the shifting access and plan premiums drove me to find something more sustainable. This process turned into a nice privacy and security exercise in the process, but the overall driver — as it will be for ALL business — is long term cost of ownership and operation.

AI is not a luxury item. It is a necessity. So much like any shared service, the initial cost of buying into an existing model is going to be lower than the price of standing it up on your own. The current state of the marketplace has shifted the targets a bit as progress in LLM memory and processing requirements have made rolling your own something that can be done on a mid-range server or higher-end desktop system. Setup is somewhat dependent on the environment and OS, but an alternative stack to support home development, learning and many common knowledge and reasoning-based tasks can be accomplished on a desktop by most with minimal technical acumen. Deploying a multi-host solution is a bit more complex but is rapidly becoming easier as AI makes rapid progress against deployment patterns. As an example, the Hermes Agent component was about a week old when I first stood it up; today you can grab the Hermes desktop application for Mac, Windows or Linux.

So while this is not technically difficult, some basic admin and development skills are needed, and this article assumes you have some hardware you can leverage on hand. While we all accepted the relatively high cost of a GPU, the recent spikes in RAM and storage have bled into the second-hand market making it a higher hurdle than it was just 6-12 months back, with little relief in sight.

---

The Build vs Buy Math

Like any small company or startup, I had to look at:

  • The costs
  • The long-term operational costs
  • The places I wanted to use AI to make it an effective toolchain with a supportable operating model
  • A desire to control my IP and protect my customers privacy
  • When you add those elements to the balance sheet it becomes a little easier to do the math. Can I handle customer IP, internal documents, planning and roadmaps on a platform that regularly harvests data for its own internal development?

    I'm not sure the industry has caught up with some of these fundamental problems but I do know I want to be able to provide answers when those questions are asked. Privacy guards for the people we provide service to should not be a marginal impact on where you want to go in this space. While there are many good reasons to build vs buy, the reason I wanted to build something repeatable and useful for multiple use cases is my belief that AI for many is going to be driven by its availability at the edge. Building the capability to deliver a cost-effective, secure stack to deliver AI at scale anywhere anytime is the goal for this project.

    My answer to build or buy was a whole-hearted build, with an eye to the FOSS and community centered AI work to help with the rough edges early adopters have to overcome to deploy these complex workloads.

    While the strategic direction was set there were still architectural choices to make. To level set, I was able to take advantage of some existing work and had some sunk cost in equipment, but I also had to adapt and acquire due to a house fire that cleared out about 90% of my old lab. While I most certainly reused anything I could lay my hands on, many of my choices were tailored around the equipment I had access to. Most if not all of this could be done on newer laptops, mini PCs, and low power options that you may already have access to.

    I took a hard look at the assets that I had, and knew going in that I wanted a VM I could safely sandbox as a starting point, so I stood up a simple single Linux-based VM as a host for my agents. I have a follow up article in which I go through the stack in more technical detail but the early choices were intended to provide the best chance of producing a working MVP quickly.

    ---

    The Three-Machine Model

    The architecture breaks down into three core machines:

    Dev Hub (.16) -- A dual-Xeon workstation running Garuda Linux. This is the control plane: code development, the OB1 knowledge base, Ollama for embeddings, and containerized services.

    Hypervisor (.113) -- A second dual-Xeon workstation running KVM/libvirt. This hosts three VMs: TrueNAS (storage), Hermes (public-facing website), and Worker4 (AnythingLLM + QA server). VMs boot in a staggered sequence with dependencies.

    GPU Inference Host (.154) -- A separate Windows box with an NVIDIA RTX 3070 Ti running Ollama. All models run here, freeing the hypervisor from GPU passthrough complexity.

    Cost breakdown per month:

  • Internet: ~$50
  • Electricity: ~$30
  • Domain + DNS: ~$10
  • Misc (drives, cables): ~$20
  • Total: ~$106/mo
  • The cloud equivalent for this compute, storage, and inference capacity would run $500+ per month on AWS or GCP.

    ---

    What This Gets You

  • 6 local LLM models (Gemma4, Qwen3, Hermes3, DeepSeek, nomic-embed-text)
  • A production RAG pipeline answering career and technical questions at ask.atkatana.com
  • A multi-agent job search system (Hermes agent framework)
  • A published blog with 10+ articles
  • Full Zero Trust network: Pi-hole, Unbound, Cloudflare Tunnel, VPN, firewall segmentation
  • 3 KVM VMs orchestrated with staggered boot and dependency chains
  • A CI/CD pipeline for blog deployment and security scanning
  • All on a single 15A circuit.

    ---

    Things That Broke

    During the migration process we discovered one of the primary hosts was crashing due to out of memory problems. When we looked at the errors we determined a rollback of the NVIDIA drivers would help us get through the next couple weeks without any real impact.

    While shifting the drivers back we found multiple dependencies that were making the switch a challenge. At one point the box simply went to a big blank screen — unable to load a desktop environment at all. I was forced to reboot and the model was utterly unable to load to the host, leaving me mid-migration with only a single line of prepared code as a backout.

    When that command also failed, I had to rollback on my own, reverse-engineering the correct package list from another machine over SSH. The lesson: even in a sandbox, have a verified break-glass option tested from outside the host before you pull the trigger.

    ---

    The Bottom Line

    Self-hosting AI is not about saving money on the first month. It's about owning your inference pipeline, controlling your data, and building capability that scales with zero marginal cost per query. The cloud will always be cheaper at low volume. But once you cross the threshold of regular, sustained use -- and once you factor in data privacy and IP control -- the math flips.

    And you learn more in a week debugging boot order than a year clicking in the AWS console.

    Built on a home lab, powered by local models, and owned by Andrew Katana.

    Connect on LinkedIn →