Home Sweet Homelab

You know how when someone's coming over, you do that frantic cleaning spree where you start shoving everything into closets and you try and convince everyone that you're always this neat? This article is like that, but for my homelab. The actual setup is a stratographic column of different ideas and experiments from different time periods that are forced to work together to form a cohesive unit. What I'm about to talk about are the Wins that have solved problems for me and made my life easier.

Let me give you the tour.

The Kubernetes Cluster

I got into Kubernetes the hard way. I had a handful of services running on various VMs, each with their own quirks and manual deployment steps. Updating anything meant SSH-ing into the right machine, remembering which directory the config lived in, and hoping I didn't fat-finger something in production. After the third time I had to delete a VM because I couldn't remember a password, I decided there had to be a better way.

Right now, I'm running a single-node cluster. Both the control plane and worker live on the same machine, which is like saying you ride a motorcycle, and it's actually a moped. The plan is to expand this into a proper multi-node setup eventually. For now, it's handling everything I need: automated deployments, rolling updates, and the kind of infrastructure-as-code setup that lets me sleep at night knowing I can rebuild everything from YAML if I need to.

The nice thing about Kubernetes is that once you get past the initial learning cliff, deploying new services becomes almost trivial. Write a manifest, apply it, and watch it spin up. No more manual configuration files scattered across different machines. Services get their own networks. Routing is included. You would need a vastly more complex solution to do what Kubernetes does even at a basic level.

Monitoring with LGTM

If Kubernetes is the engine, then monitoring is the dashboard. I'm running the full LGTM stack: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics. It sounds like overkill for a homelab, and maybe it is, but I got tired of playing detective every time something went sideways.

Every time a container would fail, it would restart, taking its logs to the grave with it. I would be none-the-wiser about what actually happened. Kubernetes is excellent for keeping uptime- well, up. But it masks the problem. This would be a reason why some tech-centric people prefer hard failures over soft failures. Hard failures can address the problem. Soft failures let the problems pile up under the rug.

Now I can pull up Grafana, see exactly when CPU spiked, cross-reference it with logs in Loki, and actually understand what my problem is. Also, there's something satisfying about having charts and graphs that show your systems are healthy. Or unhealthy. At least you know. Imagine if your own body had such charts. People would be checking them every day.

DNS Redundancy with Pi-hole

Here's a fun scenario: you're rebooting your homelab for updates or it shuts down after a power outage. You wake up and your phone doesn't work. You try to connect to your Pi-hole instance to see the problem, but your computer doesn't have an IP. So now you can't change your DNS/DHCP configuration, because your computer can't talk to the device that should be managing that. To solve the problem you would have had to have solved the problem already. A Catch-22.

This happened to me several times before I set up a Raspberry Pi running Pi-hole as a fallback. It sits outside the main homelab infrastructure, always on, always ready. When the primary DNS is up, it's just a redundant backup. When I need to restart things, it seamlessly takes over. I rsync the config to the backup Pi once a day to keep them consistent, or provide a good backup should I really screw something up.

As a bonus, Pi-hole blocks ads at the network level, which is it's main benefit, but I almost forget about it until I get off my network. It's such a robust product that I only remember it when I want to mess with DNS/DHCP (break things).

TrueNAS for Storage

Storage is one of those problems that sneaks up on you. First it's just a few documents. Then some photos. Then you're ripping your DVD collection and suddenly you need 10TB and redundancy because losing years of data to a single drive failure sounds like a nightmare.

TrueNAS handles all of this with ZFS under the hood. I've got it set up with mirrored drives, which means I can lose a disk without losing data. It serves media to devices around the house, backs up important files, and generally acts as the single source of truth for anything I don't want to lose.

The web interface makes it approachable too. I'm not particularly interested in becoming a storage expert, but TrueNAS lets me configure RAID levels, set up snapshots, and monitor drive health without needing a PhD in filesystems. It just works, which is exactly what storage should do.

Network Gear and Routing

A good homelab needs a solid network foundation. I'm running a managed switch that handles VLANs, which lets me segment traffic appropriately. IoT devices get their own VLAN, which means even if some random smart bulb gets compromised, it can't reach the rest of the network. It also brings me some sense of organization. My Rasperry Pi cluster can stay organized on one VLAN without mucking up my devices list.

The router is running custom firmware that gives me way more control than any consumer router ever would. I can shape traffic, set up VPNs, and actually see what's happening on my network. It's the kind of thing where once you start looking at the data, you realize how much garbage is constantly flying around your home network. Smart TVs phoning home. IoT devices pinging random servers. It's enlightening in a slightly creepy way.

Backup Strategy

Here's something nobody thinks about until it's too late: backups. The homelab itself is somewhat ephemeral. I can rebuild it from configs. But the data? That needs to be protected.

I'm following the 3-2-1 rule: three copies of data, on two different types of media, with one copy offsite (well sorta). TrueNAS handles the local copies with snapshots and redundancy. It's in a ZRAID configuration for drive parity. I've got an external drive I have plugged into the homelab using a docking station that I will periodically turn on. I have most of my configs backed up to my desktop or in the cloud, so in the event of a failure, I can get back online quick.

It's not exciting. It's not fun to set up. I've actually found I've misconfigured it several times and through sheer luck that hasn't been a problem yet. But it's the difference between a minor inconvenience and a catastrophic loss. I learned this lesson on my desktop when I was dual booting Windows and Linux. Windows introduced a bug that deleted non-Windows boot drives. Now everything is backed up, automated, and I can actually sleep at night.

Automation and Services

The real power of a homelab comes from what you do with it. I've got Home Assistant running for home automation, which ties together all the random smart devices I've accumulated over the years. Lights, sensors, you name it, all talking to each other in ways they were never designed to. At some point in the future, I'd like to replace my light switches with dimmers that can be run by Home Assistant, so they will dim or brighten depending on the time of day.

Kubernetes ingress reverse proxying handles all the web service routing, complete with automatic SSL certificates. I don't have to think about certificate renewal or manually configuring HTTPS anymore. Cert-manager does that all for me.

I've got a Gitea server for personal projects, a password manager to sync across devices, an Obsidian-livesync instance for my note taking, and a handful of other services that make daily life a little easier. Each one solves a specific problem, and together they form an ecosystem that's genuinely useful.

My last ditch effort for automation is n8n. It is what I would call "code glue." Any services that need to talk to each other in some way, but don't have preconfigured methods to do so, I can stitch together with n8n. It allows me to create workflows, use logic, create webhooks, and modify data before sending it somewhere else. If I need something done and there's no right way to do it, you bet I'm gonna make an n8n workflow.

Power Management

Something people don't talk about enough: power. Not just whether things are plugged in, but power management during outages. I've got a UPS (Unattended Power Supply) handling the critical infrastructure. It's not designed to keep things running for hours, but it gives me enough time to gracefully shut everything down if the power goes out.

More importantly, it protects against surges and brown-outs. Electronics are expensive, and replacing a failed server because of dirty power is both costly and annoying. The UPS communicates with the servers too, so if I'm not home and the power goes out, everything shuts down safely on its own.

Constant Evolution

One morning I woke up to find my phone disconnected from the network. To my surprise, it wasn't Pi-holes fault, it was a drive failure. Not a ZRAIDed TrueNAS drive, but the main one. I had misconfigured backups at this point. Upon restart, my homelab wouldn't boot. I was in a pickle, but I didn't dispair. I thought it would be a chance to start anew. I thought about many of the changes I would make.

First of all, I would get rid of Proxmox. I had it installed as a Type II hypervisor on the OS level. I was ~~conned~~ meme-d into installing it, only to find it overly complex to do simple things. Worst of all, I had absolutely no use for a VM. I was already comfortable with containers. If I needed a GPU I could theoretically do a passthrough to a VM, but the convoluted abstractions always made it impossible to actually configure applications with it. I would rather run an OS and run containers/VMs as need be than just abstract the functionality away.

Argo CD. A deified tool among some. A waste of resources to me. For example: any Argo CD instance needs another host of Argo CD service instances that can be quite beefy on any system. And for what? To do what Kubernetes does automatically? You would be better off creating a pipeline orchestrator from scratch using cron, webhook scripts, and Terraform.

There's a happy ending to the story. Eventually, after rebooting the homelab a couple dozen times trying and failing to hit the boot menu key fast enough, I discovered that my drive hadn't failed at all. It works even now, and for no reason at all. Amazing.

I still have future plans for my homelab. If I ever come into some more hardware, I could finally set up that second node for my Kubernetes cluster. I might even leave it bare-metal so I can keep the GPU available for any future projects where I might want some ubiquitous computing power anywhere in the world. It would also be nice to move TrueNAS to it's own dedicated hardware. There's always more to learn and experiment with. I wonder what sort of homelab I could have a year from now.