Our client in the financial services sector is looking for a Technical Team lead on a fulltime / permanent to own the their application operational efforts.
Focus is on cloud platforms (Azure / AWS), automation, and Infrastructure-as-Code (IaC) within an Agile environment.
Location : Hybrid 1d / week - Toronto
Responsibilities :
- Lead the operational stability and deployment of an e-commerce application, ensuring performance and scalability.
- Provide technical direction to the Application Operations team, mentoring members to foster continuous improvement and collaboration.
- Drive optimization initiatives using monitoring tools and data-driven insights to preemptively resolve issues.
- Develop and manage cloud infrastructure using IaC tools (Terraform, Ansible, etc.) for scalable, automated operations.
- Oversee cloud-native tools and serverless architectures to support efficient application management.
- Implement automation solutions like self-healing and auto-scaling systems to reduce manual intervention.
- Collaborate with cross-functional teams to integrate security best practices across the application lifecycle.
- Manage application certificates and oversee vulnerability management to maintain security.
- Ensure smooth application deployments with minimal downtime, using strategies like canary and blue-green deployments.
- Establish and track application performance metrics (KPIs, SLIs, SLOs) to drive operational excellence.
- Lead 24 / 7 on-call support to ensure quick incident resolution.
Requirements :
- 8+ years of experience in AppOps, DevOps, or SRE roles, with at least 3-5+ years in a technical leadership capacity overseeing large-scale, cloud-based applications (Azure and AWS).
- Proficiency in programming languages (Java preferred, Python, YAML, Go etc.)
- 5-8+ years of experience leading Agile and DevOps transformation within teams, driving continuous integration, continuous delivery (CI / CD), and Infrastructure-as-Code (IaC) practices.
- 5-8+ years of expertise in driving cross-functional teams (developers, SRE, QA, security) to deliver highly available, scalable, and resilient applications.
- 5-8+ years of experience in Infrastructure as Code (IaC) and automation technologies (Terraform, Ansible, Chef, Pulumi) and ability to guide teams in adopting best practices.
- 5-8+ years of experience with SRE principles and practices, including SLAs, SLOs, and error budgets, to drive operational excellence and reliability.
- Proven track record in designing and implementing monitoring, alerting, and observability frameworks using tools like Prometheus, Grafana, ELK, Dynatrace, and Splunk to ensure proactive issue detection and resolution.
- Ability to mentor and develop engineering talent, fostering a culture of continuous improvement, automation, and end-to-end ownership of services.
- Certifications in relevant cloud technologies and automation tools (e.g., Azure, AWS, Kubernetes, Terraform) are a plus.
- Certifications in cloud technologies and automation tools are a plus.
Nice to have :
- Microsoft Certified : Azure Administrator Associate (or similar across AWS, GCP, etc.)
- Certified Kubernetes Administrator
- Site Reliability Engineering (SRE) Certification
- Terraform Associate Certification
- ITIL Foundation
Il y a 16 heures