Honest post. I’m a senior QA engineer in Argentina and I’m trying to fix our performance testing setup, which currently is… not great.
On paper: we run load tests in CI on every merge to main. In reality: we run a 60-second test with 10 virtual users against a staging environment that has 1/8th the resources of production, with no meaningful thresholds, and we look at the results maybe once a month.
It’s performance testing theater. It gives stakeholders a green checkmark and gives us nothing.
What I actually want to build:
- Tiered testing — smoke-level perf check on every PR, real load test weekly or pre-release
- Thresholds tied to production baseline, not arbitrary numbers
- A staging environment that’s at least proportionally representative
- Automatic comparison to previous runs to catch regressions
Has anyone successfully built something like this? The staging environment problem feels like the hardest one — how do you make the results meaningful when you can’t run against production?

The “performance theater” description is painfully accurate and way more common than people admit.
On the staging environment problem — proportional scaling is the key concept that helped us. You don’t need a full production replica, you need a consistent environment and you need to interpret results relatively, not absolutely.
If staging is consistently 2x slower than production under equivalent load, that ratio becomes your calibration factor. You care less about the raw p95 number and more about whether it changed compared to last run.
The regression detection piece is more valuable than the absolute number anyway.
I would disagree, the raw p95 number is extremely useful in some scenarios.
interesting