Position Overview
We’re looking for an experienced and adaptable Site Reliability Analyst to join a growing Technology Services team. This individual will play a key role in ensuring the operational integrity and long-term scalability of our platforms. The position combines traditional IT support responsibilities with modern reliability engineering methods to create a stable and resilient technology environment that aligns with business priorities.
Key Responsibilities
-
Partner with engineering and infrastructure teams to assess the performance, resilience, and availability of systems. Advise on design decisions that impact operational reliability.
-
Simulate potential failure scenarios when new features or architectural changes are deployed. Lead analysis sessions following service disruptions to drive improvements.
-
Design and coordinate controlled failure testing (chaos engineering) to validate system robustness. Help execute performance assessments to support product readiness.
-
Provide expert-level support during system outages or client-affecting incidents, leading troubleshooting efforts.
-
Ensure system performance targets and reliability metrics are effectively defined and maintained.
-
Create and update recovery documentation (runbooks) for critical systems, and guide SRE tool and process adoption across teams.
-
Monitor usage patterns and plan for future capacity needs to maintain system responsiveness and growth.
-
Keep infrastructure configurations consistent and up to date across various environments.
-
Support ad hoc projects and contribute to broader technology initiatives as needed.
Requirements
-
A bachelor’s degree in Computer Science, Information Systems, or a related field-or equivalent practical experience.
-
5+ years of professional experience in technology operations, systems analysis, or site reliability engineering.
-
Proven ability to diagnose complex technical issues and communicate solutions clearly.
-
Familiarity with monitoring platforms, incident management practices, and vendor oversight.