Warning
Mae'r hysbyseb swydd hon wedi dod i ben ac mae'r ceisiadau wedi cau.
Senior Software Quality Engineer (Automation, Performance & Resilience)
| Dyddiad hysbysebu: | 07 Hydref 2025 |
|---|---|
| Oriau: | Llawn Amser |
| Dyddiad cau: | 06 Tachwedd 2025 |
| Lleoliad: | Coventry, West Midlands |
| Gweithio o bell: | Hybrid - gweithio o bell hyd at 4 ddiwrnod yr wythnos |
| Cwmni: | Vestir Sourcing Ltd |
| Math o swydd: | Parhaol |
| Cyfeirnod swydd: | VSLJD-20251091 |
Crynodeb
About the role
We’re looking for a Senior Software Quality Engineer to own test strategy end-to-end for backend services. You’ll build scalable automation and performance frameworks, integrate them into CI/CD, and validate resiliency and operational readiness across AWS/Azure environments. You’ll partner closely with engineering, SRE, and product to enable fast, reliable releases.
Key responsibilities
Strategy and planning
Own test strategy, planning, and estimation for services and programs
Define quality gates, risk-based coverage, and release-readiness criteria
Automation and quality engineering
Design and maintain unified automation frameworks (Java, Cucumber, Robot Framework)
Build API and integration tests (Postman), reduce flakiness, and improve maintainability
Standardize builds (Gradle) and containerize test tooling (Docker)
Performance engineering
Design, execute, and analyze load/stress/soak tests with Gatling
Model realistic workloads, establish SLOs, and provide tuning recommendations
Track throughput, latency (P95/P99), error budgets, and capacity signals
Resilience and operational readiness
Run chaos tests with Litmus; validate failure handling, timeouts, and fallbacks
Verify backup/restore and disaster recovery objectives (RTO/RPO)
Lead game-days and resilience drills; document runbooks and playbooks
Observability and feedback loops
Instrument and monitor with Prometheus, Grafana, and New Relic
Wire test results and service telemetry into dashboards and alerts
Enable data-driven go/no-go decisions with objective quality signals
CI/CD and DevOps integration
Integrate tests into pipelines (Git/GitHub), enforce quality gates, and parallelize execution
Support trunk-based development, shift-left checks, and stable environments
Collaboration and enablement
Partner with developers, SRE, and product to triage, root-cause, and prevent defects
Mentor engineers on testing best practices and reliability-first design
Contribute to documentation, standards, and continuous improvement
What we expect from the candidate (must-haves)
10+ years in Quality Engineering/SDET roles focused on backend or platform services
Strong coding with Java and hands-on automation using Cucumber and/or Robot Framework
Proven experience building CI/CD-integrated test frameworks (Git/GitHub, Gradle, Docker)
Performance testing expertise with Gatling (workload design, analysis, recommendations)
Chaos and resilience testing experience (Litmus) and operational readiness validation
Observability: Prometheus/Grafana/New Relic for metrics, dashboards, SLOs, and alerting
API testing experience (Postman), strong understanding of REST and common integration patterns
Cloud experience with AWS and/or Azure
Solid grasp of testing strategy: functional, integration, system, and non-functional
Excellent communication, critical thinking, and cross-functional collaboration
Nice to have
Hercules or similar performance harness tooling
Experience with Azure DevOps, GitHub Actions, or Jenkins (pipelines and environments)
Contract testing, service virtualization, or test containers
Kubernetes familiarity (Litmus typically runs on K8s), IaC basics (e.g., Terraform)
Domain knowledge in banking/fintech, compliance-minded testing
Success metrics you’ll influence
Reduced test cycle time and flakiness rate; improved pipeline pass rate
Meaningful automation coverage aligned to business risk
Measurable improvements in P95/P99 latency and error budgets
Fewer escaped defects and faster MTTD/MTTR via actionable telemetry
Consistent, auditable release-readiness signals
First 90 days
0–30: Onboard, baseline current coverage and performance; ship quick wins in CI gating
31–60: Deliver Gatling suites and dashboards (Prometheus/Grafana/New Relic); standardize framework patterns
61–90: Run first chaos game-day; validate backup/restore; publish reliability playbooks; measure impact
Tech stack you’ll use
Languages/Frameworks: Java, Cucumber, Robot Framework
Performance/Resilience: Gatling, Litmus, Hercules (nice to have)
API/Tools: Postman, Git/GitHub, Gradle, Docker
Observability: Prometheus, Grafana, New Relic
Cloud: AWS, Azure
We’re looking for a Senior Software Quality Engineer to own test strategy end-to-end for backend services. You’ll build scalable automation and performance frameworks, integrate them into CI/CD, and validate resiliency and operational readiness across AWS/Azure environments. You’ll partner closely with engineering, SRE, and product to enable fast, reliable releases.
Key responsibilities
Strategy and planning
Own test strategy, planning, and estimation for services and programs
Define quality gates, risk-based coverage, and release-readiness criteria
Automation and quality engineering
Design and maintain unified automation frameworks (Java, Cucumber, Robot Framework)
Build API and integration tests (Postman), reduce flakiness, and improve maintainability
Standardize builds (Gradle) and containerize test tooling (Docker)
Performance engineering
Design, execute, and analyze load/stress/soak tests with Gatling
Model realistic workloads, establish SLOs, and provide tuning recommendations
Track throughput, latency (P95/P99), error budgets, and capacity signals
Resilience and operational readiness
Run chaos tests with Litmus; validate failure handling, timeouts, and fallbacks
Verify backup/restore and disaster recovery objectives (RTO/RPO)
Lead game-days and resilience drills; document runbooks and playbooks
Observability and feedback loops
Instrument and monitor with Prometheus, Grafana, and New Relic
Wire test results and service telemetry into dashboards and alerts
Enable data-driven go/no-go decisions with objective quality signals
CI/CD and DevOps integration
Integrate tests into pipelines (Git/GitHub), enforce quality gates, and parallelize execution
Support trunk-based development, shift-left checks, and stable environments
Collaboration and enablement
Partner with developers, SRE, and product to triage, root-cause, and prevent defects
Mentor engineers on testing best practices and reliability-first design
Contribute to documentation, standards, and continuous improvement
What we expect from the candidate (must-haves)
10+ years in Quality Engineering/SDET roles focused on backend or platform services
Strong coding with Java and hands-on automation using Cucumber and/or Robot Framework
Proven experience building CI/CD-integrated test frameworks (Git/GitHub, Gradle, Docker)
Performance testing expertise with Gatling (workload design, analysis, recommendations)
Chaos and resilience testing experience (Litmus) and operational readiness validation
Observability: Prometheus/Grafana/New Relic for metrics, dashboards, SLOs, and alerting
API testing experience (Postman), strong understanding of REST and common integration patterns
Cloud experience with AWS and/or Azure
Solid grasp of testing strategy: functional, integration, system, and non-functional
Excellent communication, critical thinking, and cross-functional collaboration
Nice to have
Hercules or similar performance harness tooling
Experience with Azure DevOps, GitHub Actions, or Jenkins (pipelines and environments)
Contract testing, service virtualization, or test containers
Kubernetes familiarity (Litmus typically runs on K8s), IaC basics (e.g., Terraform)
Domain knowledge in banking/fintech, compliance-minded testing
Success metrics you’ll influence
Reduced test cycle time and flakiness rate; improved pipeline pass rate
Meaningful automation coverage aligned to business risk
Measurable improvements in P95/P99 latency and error budgets
Fewer escaped defects and faster MTTD/MTTR via actionable telemetry
Consistent, auditable release-readiness signals
First 90 days
0–30: Onboard, baseline current coverage and performance; ship quick wins in CI gating
31–60: Deliver Gatling suites and dashboards (Prometheus/Grafana/New Relic); standardize framework patterns
61–90: Run first chaos game-day; validate backup/restore; publish reliability playbooks; measure impact
Tech stack you’ll use
Languages/Frameworks: Java, Cucumber, Robot Framework
Performance/Resilience: Gatling, Litmus, Hercules (nice to have)
API/Tools: Postman, Git/GitHub, Gradle, Docker
Observability: Prometheus, Grafana, New Relic
Cloud: AWS, Azure