Monitoring Before Autoscaling
Autoscaling sounds like an infrastructure feature, but it starts as an observability problem. A system cannot scale intelligently until it can describe its own pressure.
Before reinforcement learning, predictive scaling, or custom policies, a distributed system needs boring measurement.
Define What Pressure Means
CPU usage alone is not enough. Service pressure may come from:
- request latency
- queue depth
- database connection usage
- error rate
- memory pressure
- worker backlog
- external API latency
Different services need different signals.
Build A Baseline With Load Testing
Load testing shows how a system behaves before it fails. Tools like Locust are useful because they let you model user behavior, not just raw requests.
pythonfrom locust import HttpUser, task, between
class ApiUser(HttpUser):
wait_time = between(1, 3)
@task
def read_dashboard(self):
self.client.get("/api/dashboard")
@task
def create_event(self):
self.client.post("/api/events", json={"type": "check"})The goal is not to produce a huge number. The goal is to understand where latency, errors, and saturation begin.
Metrics Need Context
Prometheus can collect metrics, but the metrics must answer real questions:
- Which endpoint is slow?
- Which service is saturated?
- Is the queue growing?
- Are workers keeping up?
- Did error rate increase after a deployment?
Dashboards should support decisions, not just look technical.
Scaling Policy Comes Later
Once the system has clear telemetry, scaling policies become easier to reason about.
A basic policy might scale when:
- p95 latency stays high for several minutes
- queue depth crosses a threshold
- worker throughput falls behind incoming jobs
Advanced policies, including ML or RL-based experiments, should be built on top of these baseline signals.
Avoid Scaling The Wrong Bottleneck
More containers do not fix every problem. If the database is the bottleneck, scaling app servers may make things worse. If external APIs are slow, more workers may increase retries and failures.
Monitoring prevents blind scaling.
Terminal Byte Approach
Autoscaling research should start with instrumentation, dashboards, and repeatable load tests. Intelligent scaling is only useful when the system already has reliable signals.