I've watched organizations spend $40,000 a month on AWS trying to solve a problem that a 15-minute code review would have caught.
More instances. Bigger nodes. Wider load balancers. Auto-scaling groups that kick in at 60% CPU and spin up six more servers to handle the same broken query — just faster.
This is one of the most expensive habits in enterprise technology. And nobody talks about it because the bill keeps getting paid, the system keeps limping along, and everyone quietly agrees that "we need more capacity."
You don't need more capacity. You need to fix what you built.
The N+1 Problem: Death by a Thousand Queries
Here's a scenario I've seen in production more times than I can count.
A system needs to display a list of 100 users and their associated roles. Simple enough. The developer writes a loop: fetch the users, then for each user, fetch their roles.
That's 101 database queries to display one page.
Now put that behind an API endpoint that gets hit 500 times a minute. You've just turned a data model problem into an infrastructure crisis. The database starts buckling. Response times spike. Someone opens a ticket: "we need to scale the DB."
So you scale the database. Bigger instance, read replicas, connection pooling. The bill goes up $3,000 a month. The page loads slightly faster. The 101 queries are still happening. You've just made the bad pattern more expensive.
The fix? One query with a proper JOIN. Or an ORM call that eager-loads relationships instead of lazy-loading them in a loop. It takes 20 minutes. It eliminates the problem entirely.
N+1 is just the most obvious example of a broader truth: you cannot scale your way out of a logic problem.
Scaling Doesn't Mean What You Think It Means
When most people say "we need to scale," they mean: add more of the same thing.
More servers. More pods. More threads. More memory.
That's horizontal scaling, and it works — but only if your workload is actually horizontally scalable. And most systems, in their default state, are not.
Here's what horizontal scaling actually requires:
- Stateless services. If your application holds session state in memory, adding more instances means half your users hit a server that doesn't know who they are.
- Efficient data access patterns. Scaling compute doesn't help if every instance is hammering a single database that can't keep up. You've just created more threads waiting on the same bottleneck.
- Clean service boundaries. If your "microservices" are tightly coupled — sharing databases, calling each other synchronously in chains — you haven't distributed your system, you've distributed your failure modes.
Horizontal scaling is a multiplier. It multiplies the throughput of a well-architected system. It also multiplies the cost and complexity of a broken one.
Before you add capacity, ask: what exactly are we scaling? If the answer is "the thing that's slow," dig one level deeper and ask why it's slow. Nine times out of ten, the answer is in the code — not the infrastructure.
Kubernetes Isn't Always the Answer
I have to be honest here, because this one is nuanced.
Kubernetes gets deployed as a solution to problems it was never designed to solve. I've seen two-person startups running a 12-node EKS cluster to host a CRUD application that gets 200 requests a day. I've seen teams spend three months "migrating to Kubernetes" only to end up with the same monolith running in pods, deployed in lockstep, failing together — a distributed monolith with a Helm chart.
That's not Kubernetes solving a problem. That's Kubernetes becoming the problem.
Here's when Kubernetes is the wrong call:
- Your team has fewer than 5 engineers and nobody owns platform
- Your services don't need independent scaling or deployment
- Your workload is predictable and steady — Lambda or ECS will do fine
- You don't have the operational maturity to handle node failures, pod evictions, and networking complexity
- You're using it because it looks good on a resume or a pitch deck
I'm not being harsh. I'm being honest. Kubernetes has real operational overhead, and that overhead has to be justified by real operational requirements.
But here's the other side — and this is where it gets magical.
When Kubernetes is right, it is genuinely transformative.
I've seen it firsthand. A system running on manually provisioned VMs with two-day deployment cycles became self-healing, auto-scaling infrastructure that recovered from node failures without a single page. Processing volume that used to require weekend maintenance windows became elastic. Spikes got absorbed in real time and scaled back down automatically when load dropped. The cost efficiency was real. The reliability improvement was measurable.
The difference wasn't the technology. It was the context.
The workloads were genuinely variable. The team had the expertise to operate it. The services had clean boundaries and independent deployment requirements. Kubernetes didn't fix the architecture — the architecture was already sound, and Kubernetes gave it room to breathe.
That's when it's magic. When the conditions are right and the foundation is solid, there is no better tool for running distributed workloads at scale. The self-healing alone is worth the complexity — when you're ready for it.
The Pattern Under All of This
Every example above — the N+1 query, the stateful horizontal scale, the Kubernetes-wrapped monolith — has the same root cause:
Reaching for an infrastructure solution to an architecture problem.
It's understandable. Infrastructure fixes are visible, billable, and feel like action. You can point to a bigger instance or a new cluster and say "we did something." Architecture changes require admitting that something was built wrong, which is harder politically and harder to explain to leadership.
But the math is unforgiving. Infrastructure is rented. A bad pattern runs on every server you add, in every request you process, at every scale you reach. You are paying, every month, for a decision that should have been revisited years ago.
The organizations I've worked with that run the leanest, most reliable systems are not the ones with the biggest budgets. They're the ones that asked hard questions early — and didn't let "just scale it" become a substitute for "let's fix it."
What To Do Instead
Before your next infrastructure escalation, run through this checklist:
- Profile before you provision. What is actually slow? Measure it. A flame graph or a query analyzer will tell you more than a CloudWatch dashboard.
- Check your data access patterns. Are you making more calls than you need to? Are you loading more data than you use? N+1 issues, over-fetching, and missing indexes are free wins hiding in plain sight.
- Ask if the problem is stateful. If you can't add a second instance without breaking sessions or consistency, fix that before you scale. Stateless services scale horizontally. Stateful services scale vertically — expensively.
- Earn your Kubernetes. If you're considering K8s, ask: do my services have independent scaling requirements? Do I have variable, unpredictable load? Do I have the team to operate this? If the answer to all three isn't yes, start with something simpler.
- Separate compute problems from logic problems. Compute problems: genuine throughput limits, memory pressure on well-optimized workloads, legitimate spikes in demand. Logic problems: bad queries, synchronous chains, tight coupling, state mismanagement. One requires infrastructure. One requires a code review.
Throwing compute at bad architecture is one of the most common and expensive habits in the industry. I've seen it in federal systems, financial platforms, and scaling startups. The bill always looks like a capacity problem. The fix is almost never more capacity.
The good news? The real problems are usually smaller than they look — and cheaper to fix than the infrastructure you're buying to avoid fixing them.