Senior Specialist Engineer (SRE)
| Posting date: | 02 December 2025 |
|---|---|
| Salary: | £41,983.00 to £52,113.00 per year |
| Additional salary information: | £41983.00 - £52113.00 a year |
| Hours: | Full time |
| Closing date: | 05 January 2026 |
| Location: | Birmingham, Leeds, Liverpool, London (Canary Wharf), E14 4PU |
| Company: | NHS Jobs |
| Job type: | Permanent |
| Job reference: | K9919-25-0464 |
Summary
We are seeking a highly motivated and experienced SRE to join our HPC & SRE engineering team. As an SRE, you will play a critical role in ensuring the stability, scalability, and performance of our services. You will combine software engineering and systems engineering to build, improve and run reliable, scalable production systems.Key ResponsibilitiesService Reliability & Performance Ensure services are stable, scalable, and performant through engineering best practices and system design. Proactively identify and address system bottlenecks using advanced problem-solving and performance tuning techniques. Conduct capacity planning and implement solutions to ensure systems can support current and future workloads Incident Response & Troubleshooting Respond swiftly to production incidents, ensuring minimal downtime and quick restoration of services. Perform root cause analysis and postmortems, implementing lessons learned to prevent recurrence. Monitoring, Alerting & Observability Contribute to the design and implementation of effective monitoring and alerting systems using tools and dashboards. Improve observability of services, ensuring issues are identified and addressed before impacting users. Continuously refine monitoring practices to reduce alert fatigue and improve response times. Automation & Tooling Develop automation to eliminate manual, repetitive tasks and improve operational efficiency. Write clear, maintainable, and well-tested code to support automation efforts and system tooling. Drive initiatives to reduce operational toil and improve reliability through Infrastructure as Code (IaC). Service Level Objectives & Operational Improvements Contribute to the definition, tracking, and continuous improvement of SLOs, Service Level Indicators (SLIs), and error budgets. Identify and prioritize operational improvements that align with business goals and user experience. SRE Best Practices & Advocacy Helping to evangelize SRE principles across the organization. Collaborate with stakeholders to integrate reliability practices into the development lifecycle. Collaboration & Knowledge Sharing Work closely with software engineering, DevOps, and infrastructure teams to streamline deployment and operational workflows. Improve cross-functional collaboration and promote a culture of shared responsibility for service reliability. Documentation & Training Maintain accurate technical documentation, runbooks, and post-incident reports. Provide training and mentorship to engineering teams on best practices and tools. Essential criteria: Experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role Coding skills in programming/scripting languages such as Python, PowerShell or Bash Understanding of Linux/Unix & Windows systems, networking, and distributed systems Experience with observability tools (e.g., Prometheus, Grafana, Datadog) and alerting systems Understanding of infrastructure automation (e.g., Terraform, Ansible, PowerShell, Helm) Excellent communication and collaboration skills Experience with security best practices Possesses problem solving skills and the ability to respond to sudden unexpected demands Desirable criteria: Experience with CI/CD pipelines, cloud platforms (e.g., Amazon Web Services, Google Cloud Platform (AWS, GCP), Azure) and container orchestration (e.g., Kubernetes) Experience with post-incident reviews Previous involvement in driving adoption of SRE practices across an organization Experience delivering training or mentoring junior engineers Selection Process Detail This vacancy is using Success Profiles and will assess your Behaviours, Experience and Technical Skills. Stage 1: Application & Sift Success profiles You will be required to complete an application form. You will be assessed on the listed 8 essential criteria, and this will be in the form of a: Application form (Employer/ Activity history section on the application) 1000 word Supporting Statement. This should outline how your skills, experience, and knowledge, provide evidence of your suitability for the role, with reference to the essential criteria. The Application form and Supporting Statement will be marked together. Please note you will not be able to upload your CV. You must complete the application form in as much detail as possible. Please do not email us your CV. Longlisting: In the event of a large number of applications we will longlist into 3 piles of: Meets all essential criteria Meets some essential criteria Meets no essential criteria Those falling into the 'Meets all essential criteria' pile will progress to shortlisting. Feedback will not be provided at this stage. Shortlisting: In the event of a large number of applications we will shortlist on the following essential criteria: Experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role. Desirable criteria may be used in the event of a large number of applications / large amount of successful candidates. If you are successful at this stage, you will progress to interview. Please do not exceed 1000words. We will not consider any words over and above this number. Feedback will not be provided at this stage. Stage 2: Interview Success Profiles You will be invited to a remote interview. Candidates will be required to take a technical test, presentation and pass the interview process successfully. This allows us to set the rate of the MPS awarded. Behaviours andTechnical Skills will be tested at interview. The Behaviours tested during the interview stage will be: Changing and Improving Lead Behaviour Making effective decisions Delivering at pace Working Together You will also be expected to prepare and present a 5 minute presentation during the interview. This will be based on either: Designing a highly available and scalable serviceOR Automating a complex operational process This will be decided and confirmed ahead of interviews. There will also be a technical test during the interview, where you will be asked technical based questions to test your knowledge. This will be based on: SRE principles Troubleshooting/incident management, System design Automation/coding Knowledge in Linux & networking Cloud technologies Interviews dates are yet to be confirmed. Candidates will be required to take a technical test, presentation and pass the interview process successfully. This allows us to set the rate of the MPS awarded. Once this job has closed, the job advert will no longer be available. You may want to save a copy for your records. Eligibility Criteria Open to all external applicants (anyone) from outside the Civil Service (including by definition internal applicants). Location This role is being offered as hybrid working based at any of our Core HQs. We offer great flexible working opportunities at UKHSA and operate using a hybrid working model where business needs allow. This provides us with greater flexibility about how and where we work, to get the best from our workforce. As a hybrid worker, you will be expected to spend a minimum of 60% of your contractual working hours (approximately 3 days a week pro rata, (averaged over a month) working at one of UKHSA's core HQs (Birmingham, Leeds, Liverpool, and London). Our core HQ offices are modern and newly refurbished with excellent city centre transport link and benefit from benefit from co-location with other government departments such as the Department for Health and Social Care (DHSC). Salary bands National £41,983 to £48,128Inner London £46,310 to £52,113 This role attracts a Market Pay Supplement up to £5,000. Please note: If you are successful at interview, and are moving from another government department, NHS, or Local Authority, the relevant starting salary principles for level transfers or promotions will apply. Otherwise, roles are offered at the pay scale minimum for the grade, but in exceptional circumstances there may be flexibility if you are able to demonstrate you are already in receipt of an existing, higher salary. Pay increases are through the relevant annual pay award for the role and terms. Security Clearance Level Requirement Successful candidates must pass a disclosure and barring security check. Successful candidates must meet the security requirements before they can be appointed. The level of security needed is Basic Personnel Security Standard.