Honest post. I’m a senior QA engineer in Argentina and I’m trying to fix our performance testing setup, which currently is… not great.

On paper: we run load tests in CI on every merge to main. In reality: we run a 60-second test with 10 virtual users against a staging environment that has 1/8th the resources of production, with no meaningful thresholds, and we look at the results maybe once a month.

It’s performance testing theater. It gives stakeholders a green checkmark and gives us nothing.

What I actually want to build:

  • Tiered testing — smoke-level perf check on every PR, real load test weekly or pre-release
  • Thresholds tied to production baseline, not arbitrary numbers
  • A staging environment that’s at least proportionally representative
  • Automatic comparison to previous runs to catch regressions

Has anyone successfully built something like this? The staging environment problem feels like the hardest one — how do you make the results meaningful when you can’t run against production?

  • cool_developer
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    We solved the tiered testing problem like this, for reference (k6 + GitHub Actions):

    • Every PR: 30-second smoke test, 5 VUs, just checks nothing is catastrophically broken
    • Nightly: 10-minute average load test against main, results posted to Slack
    • Pre-release: Full soak test, stress test, spike test — run manually with sign-off required

    The nightly run is where we actually catch regressions. The PR check is just a safety net, not a real performance signal.

    The staging environment problem feels like the hardest one

    It is. We partially solved it by tagging every load test run with the git SHA and keeping a historical baseline database. Alerting triggers when p95 increases by more than 20% vs the rolling 30-day average. Not perfect but it surfaces real problems.