Keyboard-only testing checklist 🛠️ what do you include that others usually miss?

cool_developer · 2 months ago

We solved the tiered testing problem like this, for reference (k6 + GitHub Actions):

Every PR: 30-second smoke test, 5 VUs, just checks nothing is catastrophically broken
Nightly: 10-minute average load test against main, results posted to Slack
Pre-release: Full soak test, stress test, spike test — run manually with sign-off required

The nightly run is where we actually catch regressions. The PR check is just a safety net, not a real performance signal.

The staging environment problem feels like the hardest one

It is. We partially solved it by tagging every load test run with the git SHA and keeping a historical baseline database. Alerting triggers when p95 increases by more than 20% vs the rolling 30-day average. Not perfect but it surfaces real problems.

cool_developer · 2 months ago

insightful! thanks for the reply 😀

cool_developer · 2 months ago

Quick question for OP — how are you handling test data at scale?

This is always where our load tests fall apart. We can generate the load fine, but realistic test data for 10,000 simulated users doing checkout flows requires a lot of pre-seeded accounts, products, inventory states, etc. What’s your approach?

cool_developer · 2 months ago

Very similar journey here , moved from JMeter about two years ago.

The threshold-as-code approach is the feature I’d never give up. Having performance acceptance criteria living in the same repo as the tests completely changed our relationship with stakeholders. Now we have actual conversations about what p95 latency should be before a feature ships, rather than after it’s already slow in production.

cool_developer · 2 months ago

thank you this is useful - thanks @perangkat_lunak also!

cool_developer · 2 months ago

Keyboard-only testing checklist 🛠️ what do you include that others usually miss?

cool_developer · 2 months ago

I’m a QA consultant based in Canada specialising in a11y. Honest answer to your question:

Most of it needs to stay manual. Automated tools (Axe, Lighthouse etc.) catch maybe 30–40% of real accessibility issues — the rest requires human judgment and actual assistive technology.

What you can automate:

Axe-core integrated into your Playwright suite catches low-hanging fruit on every PR
Custom linting rules for missing alt text, empty labels, bad heading structure

What you cannot automate:

Whether a screen reader actually conveys the right meaning
Logical focus order
Whether error messages are actually helpful when announced

Your exhaustion is valid. This work is skilled and time-intensive. Push back on anyone who tells you an overlay or an automated scanner “handles” accessibility.

cool_developer · 2 months ago

We did a formal test audit at my last company (fintech startup, based in the UK). Here’s roughly what we did:

Pulled test run history from the last 90 days out of our CI system
Tagged anything with >15% flake rate as “quarantine candidates”
Ran a coverage diff to see what was actually protected by the remaining tests
Deleted ruthlessly — killed about 600 tests

The hardest part was getting stakeholder buy-in to delete tests. People see test count as a metric of quality. You have to reframe it: fewer, trustworthy tests beat many unreliable ones.

cool_developer · 2 months ago

Totally agreed 👍

cool_developer · 2 months ago

The time-travel debugger is genuinely chef’s kiss

Yeah this is the one thing I keep coming back to. Nothing in Playwright matches that experience for visual debugging. The trace viewer is good but it’s not the same flow.

That said — we’re in South Korea and our pipeline runs on GitHub Actions. Playwright’s Docker image support is significantly cleaner. mcr.microsoft.com/playwright just works, whereas Cypress Docker always felt a bit janky to configure for our monorepo.