Menu

Site Reliability Engineer / SRE / Systems Engineer

Job details
Posting date: 05 February 2026
Salary: £70,000 per year
Additional salary information: (up to) + Benefits
Hours: Full time
Closing date: 07 March 2026
Location: UK
Remote working: Fully remote
Company: AWD online
Job type: Permanent
Job reference: AWDO-P14376

Apply for this job

Summary

Site Reliability Engineer / SRE / Systems Engineer

A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.

If you’ve also worked in the following roles, we’d also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production Engineer


SALARY: up to £70,000 per annum (depending on experience) + Benefits

LOCATION: Remote and Hybrid Working Options Available. You can either work remotely of if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West England

JOB TYPE: Full-Time, Permanent


JOB OVERVIEW

We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services.

As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.

This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems.


APPLY TODAY

Ready to make your next career move? Apply Now for our Recruitment Team to review.


DUTIES

Your duties as the Site Reliability Engineer / Systems Engineer include:

• Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover

• System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services

• Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues

• Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience

• Automation and Resilience: Supporting automation, incident response and continuous improvement practices

• New Service Support: Ensuring new products and features are operable, reliable and scalable from day one

• Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues

• Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports

• Incident Prioritisation: Balancing customer impact with long-term system health and stability

• Security and Compliance: Supporting compliance with security, availability and regulatory frameworks


CANDIDATE REQUIREMENTS

ESSENTIAL

• Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role

• Experience supporting production services at scale within a DevOps or SRE environment

• Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6

• Experience with observability tools such as Prometheus, Grafana, ELK or Splunk

• Hands-on experience with containerisation and orchestration using Docker and Kubernetes

• Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices

• Strong Linux administration skills with scripting capability in Bash, Python or similar

• Familiarity with CI/CD pipelines and source control tools such as GitHub Actions

• Understanding of security frameworks and operational resilience best practices

DESIRABLE

• Experience within ISP, MSP or telecommunications environments

• Familiarity with enterprise IT architectures including OSS and BSS systems

• Knowledge of information security frameworks such as ISO27001, NIST or GDPR

• Experience with infrastructure automation tools such as Terraform or Ansible


BENEFITS

• Smart casual dress code

• Free access to gym facilities

• Access to a financial wellbeing platform (on successful completion of probationary period)

• Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period)

• Access to cycle to work, childcare, and electric vehicle schemes after six months

• Brand new office with excellent transport links

• Supportive team culture, growth and career progression


HOW TO APPLY

To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV’s of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose.

JOB REF: AWDO-P14376

Full-Time, Permanent Jobs, Careers and Vacancies. Find a new job and work in Altrincham, Greater Manchester, North West England. Multi-Job Board Advertising and CV Sourcing Recruitment Services provided by AWD online.

AWD online specialise in sourcing candidates and advertising vacancies on multiple job boards for companies on a non-commission basis. AWD online operates as an employment agency.

awd online | http://www.awdo.co.uk

AWD-IN-SPJ

Apply for this job