Menu
Warning This job advert has expired and applications have closed.

Senior Software Quality Engineer (Automation, Performance & Resilience)

Job details
Posting date: 07 October 2025
Hours: Full time
Closing date: 06 November 2025
Location: Coventry, West Midlands
Remote working: Hybrid - work remotely up to 4 days per week
Company: Vestir Sourcing Ltd
Job type: Permanent
Job reference: VSLJD-20251091

Summary

About the role
We’re looking for a Senior Software Quality Engineer to own test strategy end-to-end for backend services. You’ll build scalable automation and performance frameworks, integrate them into CI/CD, and validate resiliency and operational readiness across AWS/Azure environments. You’ll partner closely with engineering, SRE, and product to enable fast, reliable releases.

Key responsibilities

Strategy and planning
Own test strategy, planning, and estimation for services and programs
Define quality gates, risk-based coverage, and release-readiness criteria
Automation and quality engineering
Design and maintain unified automation frameworks (Java, Cucumber, Robot Framework)
Build API and integration tests (Postman), reduce flakiness, and improve maintainability
Standardize builds (Gradle) and containerize test tooling (Docker)
Performance engineering
Design, execute, and analyze load/stress/soak tests with Gatling
Model realistic workloads, establish SLOs, and provide tuning recommendations
Track throughput, latency (P95/P99), error budgets, and capacity signals
Resilience and operational readiness
Run chaos tests with Litmus; validate failure handling, timeouts, and fallbacks
Verify backup/restore and disaster recovery objectives (RTO/RPO)
Lead game-days and resilience drills; document runbooks and playbooks
Observability and feedback loops
Instrument and monitor with Prometheus, Grafana, and New Relic
Wire test results and service telemetry into dashboards and alerts
Enable data-driven go/no-go decisions with objective quality signals
CI/CD and DevOps integration
Integrate tests into pipelines (Git/GitHub), enforce quality gates, and parallelize execution
Support trunk-based development, shift-left checks, and stable environments
Collaboration and enablement
Partner with developers, SRE, and product to triage, root-cause, and prevent defects
Mentor engineers on testing best practices and reliability-first design
Contribute to documentation, standards, and continuous improvement

What we expect from the candidate (must-haves)

10+ years in Quality Engineering/SDET roles focused on backend or platform services
Strong coding with Java and hands-on automation using Cucumber and/or Robot Framework
Proven experience building CI/CD-integrated test frameworks (Git/GitHub, Gradle, Docker)
Performance testing expertise with Gatling (workload design, analysis, recommendations)
Chaos and resilience testing experience (Litmus) and operational readiness validation
Observability: Prometheus/Grafana/New Relic for metrics, dashboards, SLOs, and alerting
API testing experience (Postman), strong understanding of REST and common integration patterns
Cloud experience with AWS and/or Azure
Solid grasp of testing strategy: functional, integration, system, and non-functional
Excellent communication, critical thinking, and cross-functional collaboration

Nice to have

Hercules or similar performance harness tooling
Experience with Azure DevOps, GitHub Actions, or Jenkins (pipelines and environments)
Contract testing, service virtualization, or test containers
Kubernetes familiarity (Litmus typically runs on K8s), IaC basics (e.g., Terraform)
Domain knowledge in banking/fintech, compliance-minded testing

Success metrics you’ll influence

Reduced test cycle time and flakiness rate; improved pipeline pass rate
Meaningful automation coverage aligned to business risk
Measurable improvements in P95/P99 latency and error budgets
Fewer escaped defects and faster MTTD/MTTR via actionable telemetry
Consistent, auditable release-readiness signals

First 90 days

0–30: Onboard, baseline current coverage and performance; ship quick wins in CI gating
31–60: Deliver Gatling suites and dashboards (Prometheus/Grafana/New Relic); standardize framework patterns
61–90: Run first chaos game-day; validate backup/restore; publish reliability playbooks; measure impact

Tech stack you’ll use

Languages/Frameworks: Java, Cucumber, Robot Framework
Performance/Resilience: Gatling, Litmus, Hercules (nice to have)
API/Tools: Postman, Git/GitHub, Gradle, Docker
Observability: Prometheus, Grafana, New Relic
Cloud: AWS, Azure