Hundreds of parameters in Linux, various pieces of middleware and even your cloud platform affect the performance of your applications. Unfortunately, if you’re like most people, you’ve either forgotten or never knew about most of them. As a result, they’re left at default, and they’re robbing you of both performance and money.
I first encountered the problem a few years back while working on one of the first cloud platforms. We needed a way to pass parameters to instances during boot. What caught my attention was how many parameters we’d never used. In a room full of experts in application architecture we could only identify about 30% of the parameters.
Recently, work with folks running well known online apps, it’s become clear this problem is more common than I understood and seems to be getting worse. Cloud and now containers have added more layers to the stack; each with their own settings. Engineers who entered school after cloud became common may never be trained on some layers in the stack.
So, what are these settings and what impact can they really have?
Start with the obvious, CPU and memory. In the era of cloud and more recently containers, resource allocations are actually just parameters passed to the cloud platform. While we’re talking about cloud options, do you routinely test the latest instance types? Have you tested high IOPs volumes? Is your application sensitive to stolen CPU? We frequently find just switching to a new container type can save 20% or more.
Moving on, if you’re using a JVM you should test different garbage collection settings. Testing different garbage collection options with customers we’ve found performance can more than double. With benefits like this, you’d think everyone would test, but we most often find no testing has been done.
Search online for “setting thread count” and you’ll find some vibrant (read very opinionated) discussions. In one thread (pun unavoidable) I found recommendations for the same question ranging from 2 to 15,000. How to choose? The lower suggestions tended to be philosophical arguments about not being able to chop up a processor and somehow magically get more work done. I’m guessing they don’t use virtualization. The higher numbers seemed more empirical so I’d definitely start there.
One of my favorite oddball settings is linux’s noatime. This option turns off timestamp updates in the filesystem. In some cases this option has doubled performance. In the era of cloud and containers, when no one ever logs into most systems, this makes perfect sense. Oddly enough, though, this setting is actually ancient, dating back to Unix in the 70’s when file operations were painfully slow.
If you’re using php have you experimented with different values for min_spare_servers and start_servers? Depending on workloads these can waste money and affect performance. While we’re on the subject of php, have you tried using on-demand process manager instead of dynamic?
I’m just scratching the surface of the hundreds of parameters in your applications. You should also be testing buffer sizes, API traffic limiters, file descriptor limits, cache size, and many many more. The sheer number of parameters creates a new problem – and it’s a biggy. Just 5 vms or containers with 5 parameters each and 10 possible settings each results in 255 TRILLION permutations. Go ahead and check my math if you need to.
So, what’s my point? Simply that there are more settings that affect the performance of your app than you can possibly test. Even if you try, you’ll spend months researching settings and the testing itself will take so long that results will be moot. Your app and the platform it runs on will likely have changed. Does that mean you should just wing it and buy the most resources you can afford? Amazon, Microsoft and Google would applaud you for doing so of course and, amazingly, we’ve actually run across a few companies with so much money they’ve expressly adopted that strategy.
Ok, so I’ve probably depressed you, but before we can look at how to address all this, I need to throw one more fastball high and inside just to scare you. This problem isn’t static. Events both in your control and outside your control will complicate your efforts. More on that in a future post.