Honest post. I’m a senior QA engineer in Argentina and I’m trying to fix our performance testing setup, which currently is… not great.
On paper: we run load tests in CI on every merge to main. In reality: we run a 60-second test with 10 virtual users against a staging environment that has 1/8th the resources of production, with no meaningful thresholds, and we look at the results maybe once a month.
It’s performance testing theater. It gives stakeholders a green checkmark and gives us nothing.
What I actually want to build:
- Tiered testing — smoke-level perf check on every PR, real load test weekly or pre-release
- Thresholds tied to production baseline, not arbitrary numbers
- A staging environment that’s at least proportionally representative
- Automatic comparison to previous runs to catch regressions
Has anyone successfully built something like this? The staging environment problem feels like the hardest one — how do you make the results meaningful when you can’t run against production?

We solved the tiered testing problem like this, for reference (k6 + GitHub Actions):
The nightly run is where we actually catch regressions. The PR check is just a safety net, not a real performance signal.
It is. We partially solved it by tagging every load test run with the git SHA and keeping a historical baseline database. Alerting triggers when p95 increases by more than 20% vs the rolling 30-day average. Not perfect but it surfaces real problems.
brilliant response thanks!