Do you wish to view this page in English? Change language

Technology Services Analyst – Site Reliability

Position Overview
We’re looking for an experienced and adaptable Site Reliability Analyst to join a growing Technology Services team. This individual will play a key role in ensuring the operational integrity and long-term scalability of our platforms. The position combines traditional IT support responsibilities with modern reliability engineering methods to create a stable and resilient technology environment that aligns with business priorities.

Key Responsibilities

  • Partner with engineering and infrastructure teams to assess the performance, resilience, and availability of systems. Advise on design decisions that impact operational reliability.

  • Simulate potential failure scenarios when new features or architectural changes are deployed. Lead analysis sessions following service disruptions to drive improvements.

  • Design and coordinate controlled failure testing (chaos engineering) to validate system robustness. Help execute performance assessments to support product readiness.

  • Provide expert-level support during system outages or client-affecting incidents, leading troubleshooting efforts.

  • Ensure system performance targets and reliability metrics are effectively defined and maintained.

  • Create and update recovery documentation (runbooks) for critical systems, and guide SRE tool and process adoption across teams.

  • Monitor usage patterns and plan for future capacity needs to maintain system responsiveness and growth.

  • Keep infrastructure configurations consistent and up to date across various environments.

  • Support ad hoc projects and contribute to broader technology initiatives as needed.

Requirements

  • A bachelor’s degree in Computer Science, Information Systems, or a related field-or equivalent practical experience.

  • 5+ years of professional experience in technology operations, systems analysis, or site reliability engineering.

  • Proven ability to diagnose complex technical issues and communicate solutions clearly.

  • Familiarity with monitoring platforms, incident management practices, and vendor oversight.