About the Opportunity :
Conexiom is seeking a dedicated and experienced Site Reliability Engineering (SRE) Senior Manager to lead our SRE team. The role involves leading the Cloud SRE team in day-to-day operations, which include monitoring, support activities, ensuring customer satisfaction through reliable service, and building and designing cloud infrastructure.
You will collaborate with engineering and product teams to develop strategies that aim to achieve high service reliability, performance, scalability, and availability.
This role is critical in guiding Conexiom's cloud services, emphasizing operational excellence and the integration of SRE principles into our processes.
Your efforts will be key in maintaining our commitment to providing dependable and scalable cloud solutions. If you have a strong background in site reliability engineering and a track record of leading teams to success, we welcome your application to join Conexiom.
Responsibilities :
- Daily Collaborate with cross-functional teams to identify and resolve issues related to performance, scalability, and security.
- Establish and track key performance indicators (KPIs) to measure the effectiveness of our SRE team.
- Establish service level objectives and monitor to ensure the objectives are met.
- Operates, monitors, and maintains high availability applications running in Azure Cloud environment.
- Drive continuous improvement through automation, monitoring, and testing.
- Executes automation for cloud-operations tasks and creates new automation for new situations or issues encountered; automates everything.
- Identifies and improves on possible points of failure in the infrastructure / applications.
- Lead and focus teams on root cause analysis, pattern identification and continuous improvement to optimize application performance, resiliency, and reliability.
- Facilitates blame-free root cause analysis meetings in the event of a production-systems incident so that the team can learn from mistakes and improve systems.
- Helps secure data and access policies to reduce risk.
- Looks for opportunities to drive operational efficiencies while reducing costs.
- Prepares and presents reports to all levels of leadership and staff.
- Stays abreast of industry leading best practices and brings them to the attention of the leadership team for innovative application.
- Serves as a guide and mentor to members of the Cloud Platform SRE teams to aid in their growth and development.
- Allocates available resources to meet operating objectives.
- Ensures the ongoing training and development of direct reports.
- Manage a 24 / 7 On call rotation schedule
Qualifications :
- Experience managing teams (specifically SRE, Release and DevOps).
- Strong experience with Azure cloud platform.
- Experience working in a SRE environment and applying SRE Principles.
- Experience with CI / CD tools (Azure DevOps / Jenkins / GitLab).
- Familiarity with Agile methodologies and DevOps best practices.
- Strong grasp of infrastructure as code (., Terraform, CloudFormation) and automation tools (., Ansible, Chef, Puppet).
- Experience with Kubernetes, AKS / GKS, Docker, containerization, microservices, and serverless architectures.
- Proven track record of designing, implementing, and supporting highly available and scalable infrastructure in a cloud environment.
- Experience administering and troubleshooting both Windows Server and Linux operating systems. Familiarity with Internet Information Services (IIS), Apache, and Nginx.
- Proficiency in managing and monitoring relational databases (., PostgreSQL, MySQL) and familiarity and experience with NoSQL databases such as MongoDB.
- Proficiency in at least one programming or scripting language (., Python, Go, .NET, Bash).
- Skilled in developing monitoring strategies and frameworks that provide real-time insights into system health, performance bottlenecks, and security vulnerabilities.
- Expertise in automating alerting and troubleshooting processes to ensure rapid response to incidents and minimize downtime.
- Bachelor's degree in computer science or related field.
- At least 3 years of leadership experience, specifically managing SRE and DevOps teams. .
- 8+ years of experience in SRE or DevOps roles.
- Excellent communication and collaboration skills.
- Ability to work in a fast-paced, dynamic environment.
- Passion for technology and continuous learning.
Compensation : The targeted salary range for this position falls between $130,000 to $160,000 CAD depending on experience and qualifications