Warning This job advert has expired and applications have closed.

SITE RELIABILITY ENGINEER (SRE)

Job details
Posting date:	22 March 2024
Hours:	Full time
Closing date:	21 April 2024
Location:	Wakefield, West Yorkshire
Company:	OADIGITALS LTD
Job type:	Permanent
Job reference:	jobref06

Summary

About Us:
OADIGITALS is a leading IT consulting agency committed to delivering reliable and scalable digital solutions to our clients. We are passionate about ensuring the availability, performance, and security of mission-critical systems. Join our dynamic team and play a key role in shaping the reliability of our infrastructure and applications.

Position Overview:
We are seeking an experienced Site Reliability Engineer (SRE) to join our team. As an SRE at OADIGITALS, you will be responsible for designing, implementing, and maintaining resilient systems and infrastructure to support our clients' digital initiatives. You will work closely with cross-functional teams to improve system reliability, automate processes, and mitigate operational risks.

Key Responsibilities:
Design and implement resilient, scalable, and highly available infrastructure and applications.
Develop and maintain automation scripts and tools for deployment, monitoring, and incident response.
Collaborate with software engineers to design and implement reliable and efficient CI/CD pipelines.
Monitor system performance, analyze trends, and proactively identify areas for optimization and improvement.
Implement and maintain robust monitoring, alerting, and logging solutions to ensure system health and reliability.
Participate in incident response, troubleshooting, and resolution to minimize downtime and impact on operations.
Conduct post-incident reviews and implement corrective actions to prevent recurrence.

Requirements:
Proven experience as a Site Reliability Engineer or similar role.
Strong understanding of cloud computing platforms (e.g., AWS, Azure, Google Cloud).
Proficiency in scripting languages such as Python, Shell, or PowerShell.
Experience with configuration management and infrastructure as code (IaC) tools (e.g., Terraform, Ansible).
Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Excellent problem-solving skills and attention to detail.

Preferred Qualifications:
Experience with incident management and post-incident analysis frameworks (e.g., ITIL, SRE practices).
Knowledge of security best practices and compliance frameworks (e.g., SOC 2, GDPR).
Familiarity with chaos engineering principles and tools (e.g., Chaos Monkey, Gremlin).
Experience working in Agile or DevOps environments.
Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer).

Why Join Us:
Opportunity to work on challenging projects with leading clients across various industries.
Collaborative and supportive work environment that encourages continuous learning and growth.
Professional development opportunities through training, certifications, and conferences.
Competitive salary and benefits package.
Flexible work arrangements and a healthy work-life balance.

OADIGITALS is an equal opportunity employer and values diversity in the workplace. We appreciate all applications; however, only those selected for an interview will be contacted.