What is Continuous Optimization?

The rise of DevOps has ushered in an era of high-velocity delivery and daily releases of new code. But despite the ever-growing complexity of cloud applications, the post-delivery portion of the Continuous Integration & Continuous Deployment pipeline has been woefully neglected. 

 

Most enterprises leave their apps running hot, and only attempt to tune their apps for performance because they’ve crashed, or because of response downtime. Most application parameters are left untouched altogether, and enterprises massively overprovision in order to buy peace of mind. 

 

This oversight is costing large enterprises tens of millions of dollars, and hampering their performance. Fortunately, a concept and practice has emerged that can both cut the overspend and boost app performance: Continuous Optimization.

Why DevOps Demands Continuous Optimization

Research reveals that enterprises are having serious problems when it comes to optimizing their cloud performance and spend.

 

  • 80% of finance and IT leaders report that poor cloud financial management has had a negative impact on their business.
  • 69% regularly overspend their cloud budget by 25%.
  • 57% worry daily about cloud cost management.

 

Why?

 

Because IT has changed. Before DevOps, this is how it was:

Your developers wrote the code, which went through a build phase. After that, there were manual tests on the app and, upon the discovery of bugs or areas of improvement, manual tuning. When the team was sure the app was 100% ready, then they deployed it manually. This cycle was repeated for every new release, which came every month or two.

 

But with DevOps, it now looks like this:

Once the code is written, the CI/CD pipeline picks it up and carries automatic builds, tests, and deployments. The processes happen rapidly, in short cycles, again and again. Within this agile setting, Continuous Integration and Continuous Deployment are the norm.

 

Most enterprises are already using a robust and effective CI/CD toolchain. They are operating a delivery pipeline where developers blend their work within a single repository, and new code reaches users quickly and safely, generating maximum value. 

 

But did you notice something missing from the DevOps paradigm? That’s right: there is no effective post-release optimization and tuning. The post-delivery portion of the CI/CD toolchain is totally neglected.

 

Organizations will object: “No, that can’t be right. We do tune our applications.” Well, yes, and no.

 

In the DevOps paradigm, performance tuning and optimization does happen. But unfortunately, it open happens when things are running hot, or in response to downtime or failure to meet an SLO or SLA. When something in the app breaks, when a team starts to notice over-provisioning in the infrastructure, or when the application performs poorly and customers start complaining – these are the only times that people think to optimize. 

 

And on the rare occasion that they do tune their apps, teams bring in a set of siloed and only partially effective tools. Why don’t these tools get the job done properly? Because they focus on the code and the app layer (UI, database schema, etc.). APM systems monitor basic app transactions and only trigger alarms if something goes really wrong. At most, they might offer some broad recommendations about how to reduce bottlenecks. This isn’t true optimization at all. 

Continuous Optimization is a Fine Art

True optimization for cloud apps and systems need to go deeper than surface level recommendations that only focus on the resource level. But why are current optimization tools not equipped to do this?

 

Look at it this way:

 

Even a simple five container application can have more than 255-trillion resource and basic parameter permutations.

 

This makes a vast amount of configuration tweaks available at any given moment. To really engage with this system, you would need comprehensive and flawless knowledge of the entire infrastructure:

And you would need this knowledge to cover layers across the application, data, and cloud infrastructure stack. On top of this, you would need deep familiarization with the application workload itself. 

 

It is highly unlikely that any human staff member will possess this knowledge or visibility. The developer who wrote the code is unlikely to be savvy enough when it comes to infrastructure. Even the rare person who is comfortable with both kinds of knowledge – infrastructure, and application workload – is guaranteed to be something of a generalist, and lack the deep knowledge needed to carry out real optimization. 

 

And even if they had the knowledge, they couldn’t move fast enough. Because the measuring and tweaking that is needed to continuously optimize your app needs to happen at lightning speed.

 

This is because modern app workloads are undergoing constant change. Round the clock, developers are releasing new features, middleware is getting updated, behaviour patterns are shifting, and cloud vendors are releasing new resource options. 

 

Attempting to optimize with the right instance-type, the number of instances, and settings for each instance involves numerous interdependencies that are the cognitive equivalent of playing a thousand chess games at once. And due to the speed of these changes, even if you did take the time to understand your infrastructure deeply, by the time you did, that understanding would be outdated.

 

This is why cloud and mobile apps  chronically run with less performance and more cost than is ideal and possible for that workload: because manual optimization is impossible to do on every layer of your stack. 

Continuous Optimization is Built on AI

Real continuous optimization is beyond the reach of human cognition. 

 

The solution? Leverage artificial intelligence (AI).

 

Achieving maximum efficiency for apps operating in the cloud means making judgements and decisions that are too numerous and fast-moving for human minds. But these judgements and decisions aren’t too numerous or too fast-moving for an AI. 

This is the basic Continuous Optimization model:

 

  • After the app code has passed through the CI/CD pipeline, the Continuous Optimization tool begins to measure the performance of that code. 

 

  • The CO tool formulates predictions about which set of configurations can further improve the performance of the application or reduce the cost incurred. 

 

  • Then, it tweaks the settings and configuration parameters, implements the changes, and runs tests. 

 

  • While this is going on, the CO tool measures data from the testing process and analyzes the data to learn how the changes affected the performance and/or cost. 

 

  • The CO tool takes these learnings, compares them to previous data, and makes another set of predictions, which lead to a new set of configurations. 

 

  • This cycle runs repeatedly and non-stop. The system keeps on finding new ways to achieve the highest possible performance with the lowest possible cost.

Opsani: Real Continuous Optimization

Opsani uses deep reinforcement learning (Deep RL) to optimize cloud infrastructure. Deep RL utilizes neural networks that are inspired by the connectivity and activation of neurons in the brain. When properly trained, these neural networks can represent your hidden data, which allows Optune to build a knowledge base of optimal and sub-optimal configurations, similar to how the brain develops effective patterns of behaviour.

 

Following an implementation process that is as straightforward as a simple Docker run command, Opsani integrates with your existing CI/CD automated deployment pipeline and goes right to work. Right away, Opsani begins monitoring your entire system. It pays close and granular attention to how the shifts in every sort of setting and parameter affect performance. This information is fed back into the neural network, which processes and learns everything it sees, so that its insights compound. 

 

This compounding means that the Opsani engine becomes exponentially better at tuning performance and improving efficiency. 

 

The deep reinforcement learning enables Opsani to continuously examines millions of combinations of configurations to identify the optimal combination of resources and parameter settings. Opsani takes in metadata about an application, making small tweaks to resource assignments and configuration settings to enhance performance or reduce cost and then continuously remeasures

 

And: Opsani is built specifically to perfect those settings that are usually judged too complex to touch, such as: 

 

  • Resources
    • CPU
    • Memory
       
  • Middleware configuration variables
    • JVM GC type 
    • Pool sizes
       
  • Kernel parameters
    • Page sizes 
    • Jumbo packet sizes
       
  • Application parameters
    • Thread pools
    • Cache timeouts
    • Write delays

 

  • And many, many more. 

 

Because it is constantly gathering new and more powerful data, the Opsani intelligence is able to constantly uncover more and more new solutions. This often includes counterintuitive solutions that are not apparent to a human user. Opsani reacts constantly to new traffic patterns, new code, new instance types, and all other relevant factors. With each new iteration, the system’s predictions hone in on the optimal solution, and as improvements are found they can be automatically promoted.

 

Infrastructure is tuned precisely to the workload and goals of the application – whether those goals relate to cost, performance, or some balance of the two.

Conclusion

In the DevOps era, Continuous Optimization is a must for any enterprise with cloud-based, medium-to-large applications that have a need to retain reliability and performance while reducing cost. If you have a significant cloud spend (or internal chargeback), and frequent rollouts/updates, then you need CO.

 

Our internal data reveals that on average, when they implement Continuous Optimization, Opsani users experience a 2.5x increase in efficiency, or a 40-70% decrease in cost.