Recherche d'emploi > Vancouver, BC > Site reliability engineer

Site Reliability Engineer (SRE)

NetApp
Vancouver, British Columbia, CA, V5Y 3C9
Temps plein

Title : Site Reliability Engineer (SRE)

Location :

Bangalore, Karnataka, IN, 560071

Requisition ID : 127074

Job Summary

As a Site Reliability Engineer (SRE) with a specialization in storage, you'll manage and optimize a portfolio of customer-facing cloud services (SaaS / IaaS) on Google Cloud Platform (GCP), ensuring their overall availability, performance, and security.

You will collaborate closely with global teams from NetApp and GCP, with a primary focus on supporting Google Cloud NetApp Volumes.

This position includes rotational on-call work as part of a global team due to the critical nature of the services we support.

You will be working in a dynamic and fast-paced environment as an engineer on the Site Reliability Engineering (SRE) team.

This team is responsible for assisting customers of Google Cloud NetApp Volumes in resolving complex technical issues in production environments.

We are seeking an SRE with a deep understanding of storage systems, complex distributed systems, and cloud technologies, and the ability to articulate these concepts clearly to customers and fellow engineers.

You will work with your teammates and our customers to support innovative, cutting-edge technologies that address real-world challenges.

You will provide valuable feedback and guidance to our Product and Engineering teams while representing the voice of our customers.

You have the opportunity to make a significant impact and take real ownership of your work.

Job Requirements

o Collaborate with external customers and partners to ensure their success with Google Cloud NetApp Volumes.

o Respond to, troubleshoot, and drive root cause analysis (RCA) of complex live production incidents, including cross-platform issues involving OS, networking, and databases in cloud-based SaaS / IaaS environments by following and implementing SRE best practices.

o Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Google Cloud Monitoring, ElasticSearch, Grafana, and SolarWinds.

Develop and implement steps to improve system and application performance, availability, and reliability.

o Document system knowledge, create runbooks, and ensure critical system information is readily available.

o Stay up-to-date with security trends and proactively identify, diagnose, and resolve complex security issues.

o Maintain and monitor deployment, orchestration of servers, Docker containers, databases, and general backend infrastructure.

o Automate tasks and system components that would benefit from automation or are performed manually.

o Utilize Atlassian Jira to track issues to resolution based on their priority.

o Engage in incident management processes and resolve issues within agreed SLAs / SLOs.

o Extensive experience in storage technologies and incident management processes.

o Advanced knowledge of Linux operating systems (e.g., Ubuntu, CentOS).

o Proficiency in container-based architecture (e.g., Kubernetes).

o Intermediate to advanced knowledge of automation tools and scripting languages such as Ansible, Python, Bash, Go, and PowerShell.

o Solid understanding of algorithms, data structures, and databases (SQL / NoSQL).

o Intermediate knowledge of networking concepts.

o Hands-on experience with cloud environments, particularly GCP.

o Exceptional debugging skills across various platforms and technologies.

o Familiarity with site reliability engineering principles and best practices.

Education

BE in Computer Science or a related field, or 6+ years of professional experience in a relevant role.

Job Segment : Cloud, Software Engineer, Database, Computer Science, Linux, Technology, Engineering

Il y a 12 heures
Emplois reliés
NetApp
Vancouver, Colombie-Britannique

You will be working in a dynamic and fast-paced environment as an engineer on the Site Reliability Engineering (SRE) team. Title: Site Reliability Engineer (SRE). As a Site Reliability Engineer (SRE) with a specialization in storage, you'll manage and optimize a portfolio of customer-facing cloud se...

Electronic Arts
Vancouver, Colombie-Britannique

You will build and operate distributed, large-scale, cloud-based infrastructure using modern open-source software solutions.You will use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD & MTTR) and repair services.You will maintain C...

Behavox
Canada

As a Site Reliability Engineer, you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product, and Engineering teams to...

Perlego
Vancouver, Colombie-Britannique
Télétravail

Site Reliability Engineer (SRE). Experience in Site Reliability Engineering, DevOps, or a similar field. In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India. As an SRE, you will pro...

Taurus SA
Vancouver, Colombie-Britannique

We are seeking talented SREs to build our Solutions Engineering team in Vancouver. You’ll partner with clients to uncover requirements and work closely with our engineering teams, creating roadmaps, architecting solutions, and executing on them. Ensure operational excellence of Taurus' managed servi...

Royal Bank of Canada>
Vancouver, Colombie-Britannique

The Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. Development of SRE solutions (monitoring and alerting, machine learning anomaly detection, ...

Behavox
Canada

As a Site Reliability Engineer you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product and Engineering teams to d...

Royal Bank of Canada>
Vancouver, Colombie-Britannique

The Lead Support SRE will be responsible for the supporting and spearheading the development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. Spearhead the development of SRE solutions (monitoring and alerting, machin...

Okta, Inc.
Canada

A proven track record of successful SRE engagements and collaborating closely with engineering teams. Triaging and troubleshooting complex production issues to ensure reliability and performance. Are passionate about encouraging the development of engineering peers and leading by example. ...

CLIO
Vancouver, Colombie-Britannique

As a Site Reliability Engineer, you will help build, improve, and maintain Clios globally distributed network of service regions, which enables our clients worldwide to excel in their respective jurisdictions. ...