Honest post. I’m a senior QA engineer in Argentina and I’m trying to fix our performance testing setup, which currently is… not great.

On paper: we run load tests in CI on every merge to main. In reality: we run a 60-second test with 10 virtual users against a staging environment that has 1/8th the resources of production, with no meaningful thresholds, and we look at the results maybe once a month.

It’s performance testing theater. It gives stakeholders a green checkmark and gives us nothing.

What I actually want to build:

  • Tiered testing — smoke-level perf check on every PR, real load test weekly or pre-release
  • Thresholds tied to production baseline, not arbitrary numbers
  • A staging environment that’s at least proportionally representative
  • Automatic comparison to previous runs to catch regressions

Has anyone successfully built something like this? The staging environment problem feels like the hardest one — how do you make the results meaningful when you can’t run against production?

  • pakistani_tester1947
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    The “performance theater” description is painfully accurate and way more common than people admit.

    On the staging environment problem — proportional scaling is the key concept that helped us. You don’t need a full production replica, you need a consistent environment and you need to interpret results relatively, not absolutely.

    If staging is consistently 2x slower than production under equivalent load, that ratio becomes your calibration factor. You care less about the raw p95 number and more about whether it changed compared to last run.

    The regression detection piece is more valuable than the absolute number anyway.

  • cool_developer
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    We solved the tiered testing problem like this, for reference (k6 + GitHub Actions):

    • Every PR: 30-second smoke test, 5 VUs, just checks nothing is catastrophically broken
    • Nightly: 10-minute average load test against main, results posted to Slack
    • Pre-release: Full soak test, stress test, spike test — run manually with sign-off required

    The nightly run is where we actually catch regressions. The PR check is just a safety net, not a real performance signal.

    The staging environment problem feels like the hardest one

    It is. We partially solved it by tagging every load test run with the git SHA and keeping a historical baseline database. Alerting triggers when p95 increases by more than 20% vs the rolling 30-day average. Not perfect but it surfaces real problems.