Dewislen
Warning Mae'r hysbyseb swydd hon wedi dod i ben ac mae'r ceisiadau wedi cau.

Operations Resilience Manager

Manylion swydd
Dyddiad hysbysebu: 17 Hydref 2024
Cyflog: £40,201 i £43,347 bob blwyddyn
Oriau: Llawn Amser
Dyddiad cau: 03 Tachwedd 2024
Lleoliad: FY4 5ES
Cwmni: Government Recruitment Service
Math o swydd: Parhaol
Cyfeirnod swydd: 373532/1

Crynodeb

Do you have experience in identifying and addressing risks to delivery of IT Products and Services?

Have you a background in Disaster Recovery planning and IT resilience?

This is a fantastic opportunity to join DWP Digital as an Operations Resilience Manager (ORM), where you will be responsible for maintaining the performance, availability and stability of live service through prevention, or minimising the likelihood, of disruptions caused by loss of IT service or a disaster.

As part of a team, you will ensure mechanisms to adapt, recover and learn from operational disruption are put in place, and provide advice and guidance on capacity/throughput issues.

As owner of IT Service Continuity Management (ITSCM) and Capacity Management & Performance Processes and Procedures (P&Ps) in line with technology roadmaps, ORMs use their knowledge and experience to ensure live service performance is maintained in line with business goals and requirements, at all times.

The ORM has experience and end to end knowledge of an enterprise class IT estate and the associated support methodologies for business-critical IT services and systems. They understand potential service impacting factors, working across IT Operations and technical capabilities, using their expertise to identify, mitigate and track risks from multiple digital and non-digital factors. Factors include, but are not limited to, recoverability (testing failover and data restore), technical resilience, IT Capacity, user hours lost trend, unmitigated risks, organisational change, 3rd Party Providers, finance.

The ORM will own ITSCM and Capacity risks and provide guidance and support to key stakeholders and technical support teams to manage, mitigate and eradicate their specific risks. They manage plans to test Disaster Recovery and resilience, and forecast capacity trends, ensuring threats to live services associated with the processes owned, and other potential service impacting factors have the appropriate action plans. They will create the annual Capacity Plan and manage the On Premise Hosting Site Recovery Plan.

The ORM makes use of reporting tools and dashboards to report performance, they maintain historical data to identify trends and analyse emerging issues.

Key activities relevant to all Operations Resilience Managers are:

  • Manage the capacity of Digital Services and its environments, taking account of capacity and performance limits for all components of the service. Assess and forecast the impact of capacity issues affecting live service operation and the consequential effects it may have on the wider estate.
  • Assure and report on the Resilience and Disaster Recovery Capability of all DWP’s IT services, maintaining and managing the on-premise datacentre recovery plan. Engaging with Technical and Delivery leads to drive performance improvement, identifying and managing opportunities, understanding their goals to ensure Service Continuity requirements support business need.
  • Co-ordinate relationships with a range of stakeholders to ensure they provide value for money services and manage relationships with both internal and external projects, technical teams, service management and contractors.
  • Outline the necessity for DR failover, data recovery testing and Capacity planning, exploring standard options and those that can be tailored to suit the needs of individual technical services.
  • Give advice and guidance, engaging throughout the delivery lifecycle to ensure policies and procedures are considered and that governance measures are understood. Providing consultancy to stakeholders during monthly reviews and on an ad hoc basis.
  • Review performance and Operational Resilience Factors with Business Service Owners, managing action plans and identifying improvements as the requirement arises.
  • Understand and focus on customer business objectives, quality, service excellence, customer satisfaction and zero tolerance to production outages.
  • Support the Major Incident Management process following a Disaster or Major Incident event, in restoring live service as quickly and safely as possible with the smallest possible business impact and ensuring communications are issued as appropriate.
  • Assess the impact of current issues affecting live service operation and the consequential effects it may have, from a Capacity and Disaster Recovery perspective, on the wider estate. This would involve application of contracts, finance, technology and complexity, service support arrangements, current performance of the service, future change to the service and other aspects impacting on the service including production operational service risks and issues.
  • Review the performance of the business services, associated risks and issues with Technical and Business Service Owners.
  • Understand and focus on customer business objectives, quality, service excellence, customer satisfaction and zero tolerance to production outages.
  • Facilitate and manage service-related incidents from a Capacity and Disaster Recovery perspective and its communications through tracking, prioritising problems and operating between the Command Centre and other Digital services (e.g. Operational Areas, Service Management, Service Desk), to ensure services are quickly and safely restored with the smallest possible business impact.

There may be a requirement to join a rota for 'on-call' support. Further details will be given to those candidates invited for interview.