Recherche d'emploi > Vancouver, BC > Télétravail > Site reliability engineer

Site Reliability Engineer (Remote)

Perlego
Vancouver, British Columbia, Canada
105K $ / an
Télétravail
Temps plein

What we do

At Perlego, there are over 100 of us working hard to make education accessible to all. In this digital age, we believe that anyone should be able to learn anything at any time.

Knowledge should be more accessible, not locked behind sky-high price tags.

Over the past 5 years, our goal has been to support students across the UK & Europe to access quality books. The next stage of Perlego is twofold : 1) expand our support to students globally, and 2) build a product that goes beyond the book, a platform that helps students study smarter and more effectively.

What we're looking for :

We are looking for an experienced Site Reliability Engineer (SRE) with a strong background in AWS services and monitoring tools.

In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India.

You will be integral to swiftly addressing issues, resolving incidents independently, and thriving in a fast-paced environment.

How we collaborate :

Our organization operates across multiple time zones, with teams based in across Europe. As an SRE, you will provide critical support during off-hours, working autonomously to resolve issues while collaborating closely with our teams to ensure continuous service availability.

You will be part of a global team, supporting cloud infrastructure and platform initiatives.

What you’ll do :

As a Site Reliability Engineer , your main focus will be to ensure our services remain highly available and performant. Key responsibilities include :

Monitoring & Incident Management :

  • Monitor and manage platform activity using tools like Datadog , Prometheus , Grafana , or AWS CloudWatch .
  • Respond quickly to alerts and incidents, independently resolving issues and ensuring service uptime during off-peak hours.
  • Conduct post-incident reviews and help improve system resiliency through automation and monitoring enhancements.

Cloud Infrastructure Management :

  • Manage and support AWS infrastructure , focusing on scalability, security, and reliability.
  • Handle deployments, managing CI / CD pipelines for both containerized (Docker / Kubernetes) and serverless (AWS Lambda) applications.
  • Ensure effective backup, recovery, and disaster recovery strategies to minimize downtime.

Collaboration & Communication :

  • Collaborate with cross-functional teams to implement platform improvements.
  • Work independently and make swift decisions when managing service incidents outside core business hours.
  • Assist in platform security, ensuring adherence to best practices for cloud security and compliance.

Continuous Improvement :

  • Automate manual processes to reduce human error and improve efficiency.
  • Continuously enhance monitoring systems, ensuring robust early detection and resolution capabilities.
  • Identify potential performance bottlenecks and contribute to overall platform optimization.

Requirements

This role is ideal for you if you possess :

  • Experience in Site Reliability Engineering, DevOps, or a similar field.
  • Strong experience with AWS services
  • Expertise in using monitoring tools ( Prometheus, Grafana, CloudWatch) for real-time platform performance insights.
  • Hands-on experience with CI / CD pipeline management for deploying containerized (Docker) and serverless applications.
  • Proficiency in Linux-based operating systems and shell scripting.
  • Familiarity with Infrastructure as Code tools (Terraform, CloudFormation).
  • Experience with incident management, troubleshooting, and platform recovery in high-pressure environments.
  • Strong communication skills with a proven ability to work both independently and collaboratively across time zones.

It’s a plus if you have :

  • Experience working in a global, distributed team providing off-hours support.
  • Knowledge of container orchestration tools.
  • Previous experience with SecOps and cloud security best practices.
  • Familiarity with scaling highly available systems in a fast-paced, growth-oriented environment.

Benefits

Benefits include : Compensation

Compensation

The salary available for this role is CA$105,000 + Share options

Why should you work at Perlego?

Apart from our mission, we foster a unique company culture championing self-empowerment, personal development, direct communication and mutual support.

We’re proud of our Glassdoor reviews and the fact that 97% of our team would recommend Perlego as a place to work.

Want to learn more about how we’re making learning accessible? Check out our latest impact report

L&D Budget

We value continuous learning and you will have a personal L&D budget for online courses, subscriptions or books not on Perlego.

Unlimited Coaching Opportunities

Unlimited access to MoreHappi , an on-demand professional coaching platform to offer all employees access to unbiased and professional coaching opportunities.

Learning Time

All employees have dedicated Learning Time to focus on new skills, projects or interests that lay outside of their day-to-day job

Work-Life Balance

Everyone needs a break, so enjoy 30 days off (incl. bank holidays) + 1 additional day annual leave for every year of service up to 35 days off (incl. bank holidays)

Flexi Bank Holidays

We understand that not everyone aligns with the same calendar; we offer the flexibility to take your local country's bank holiday allowance for other religious or cultural days.

switch UK Easter Bank Holidays Days for Eid celebrations

Office Reset

All employees can also enjoy the days between Boxing Day and New Year off, to reset and refresh for the new year - this is additional to your annual leave

Sabbatical

After three years there is an opportunity to take a 1-month unpaid sabbatical, and after five years there is an opportunity to take a 1-month paid sabbatical

Personal Days

Life happens and we want you to be able to use your annual leave for resting, relaxing or taking time out to do something you love!

We offer 1 additional day a year for life events (your wedding, relocation, moving house, or a child starting school).

Health & Wellbeing

We want everyone to feel healthy and happy, so you get private medical insurance

Family time

We believe family is really important; we offer new parents a competitive matched parental leave as well as a phased return to work from extended leave.

Belonging at Perlego :

We are an equal opportunity employer and value diversity of thought and background.

We are actively building a diverse team, so we strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.

To enable an equitable experience for all and give you the best chance of success, if you have any specific requirements for any stage of the interview process,

Il y a 24 jours
Emplois reliés
Perlego
Vancouver, Colombie-Britannique
Télétravail

Site Reliability Engineer (SRE). Experience in Site Reliability Engineering, DevOps, or a similar field. In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India. ...

Royal Bank of Canada>
Vancouver, Colombie-Britannique

The Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. Development of SRE solutions (monitoring and alerting, machine learning anomaly detection, ...

Electronic Arts
Vancouver, Colombie-Britannique

Work as a technical liaison with development teams to address build issues and improvements.Create, modify, and maintain pipelines and workflow tools.Write application code to enhance various tools in the system.Collaborate with team-mates to maintain and enhance an automation pipeline.Monitor autom...

Okta, Inc.
Canada

Triaging and troubleshooting complex production issues to ensure reliability and performance. Are passionate about encouraging the development of engineering peers and leading by example. A proven track record of successful SRE engagements and collaborating closely with engineering teams. ...

Behavox
Canada

As a Site Reliability Engineer, you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product, and Engineering teams to...

Electronic Arts
Vancouver, Colombie-Britannique

You will build and operate distributed, large-scale, cloud-based infrastructure using modern open-source software solutions.You will use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD & MTTR) and repair services.You will maintain C...

CLIO
Vancouver, Colombie-Britannique

As a Site Reliability Engineer, you will help build, improve, and maintain Clios globally distributed network of service regions, which enables our clients worldwide to excel in their respective jurisdictions. ...

Jobber
Canada
Télétravail

Senior Site Reliability Engineer. Our Software Engineering team is pivotal to Jobber's success, creating software that adds value to tens of thousands of users worldwide. As a part of our cloud infrastructure team (SRE), you'll play a critical role in empowering our product development teams, ensuri...

Taurus SA
Vancouver, Colombie-Britannique

We are seeking talented SREs to build our Solutions Engineering team in Vancouver. You’ll partner with clients to uncover requirements and work closely with our engineering teams, creating roadmaps, architecting solutions, and executing on them. Ensure operational excellence of Taurus' managed servi...

Mojio
Canada

Title: Senior Site Reliability  Engineer. Location: USA or Canada - Remote. ...