Senior Site Reliability Engineer

3 weeks ago


Old Toronto, Canada MongoDB Full time
The MongoDB Cloud Team

The MongoDB Cloud Team is a diverse collection of individuals working together to provide MongoDB in the cloud at global scale. The Cloud Team is responsible for several services including:

  1. MongoDB Atlas - our database as a service offering and fastest growing product
  2. MongoDB Realm - our serverless platform offering that allows developers to build apps on MongoDB without managing any infrastructure
  3. Atlas Data Lake - our newest offering

The Cloud Site Reliability Engineering Team designs and builds the global infrastructure on which we deploy our services. As our customers grow and globalize, our services must satisfy demands for low-latency requests around the globe, and comply with various data sovereignty requirements. The SRE Team’s mission is to build this increasingly complex infrastructure, while continually lowering the operational burden associated with it, and increasing our internal visibility into the health of the system. We are strong believers in infrastructure-as-code and self-healing systems. The SRE Team is fully integrated with all the other Cloud teams, and the teams work closely together with a soft and traversable boundary between their areas of responsibility.


Location: This role can sit in our NYC HQ, Austin, or remote from any Eastern or Central Time location. When based in an office, we provide hybrid work accommodation.


Responsibilities

Design and build the infrastructure for a global cloud service that comprises hundreds of thousands of MongoDB clusters, processes a billion metrics per day, and replicates tens of billions of database writes to our backup service. Design, implement, and troubleshoot the automation and monitoring of services that seamlessly spans the globe - including several cloud providers. Become an expert in infrastructure performance, helping us optimize from the application level all the way through the firmware. Build for resilience. Our goal is that nobody’s pager goes off, ever. Are we there yet? No. Are we really close? Very. While we work on that - participate in a weekly on-call rotation. Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability.


Requirements

You have:

  • Experience running a mission critical service at scale
  • An understanding of information security issues
  • Prior experience running critical production systems in a Linux environment
  • Firm grasp of at least one modern programming language, beyond basic scripting
  • Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)
  • Bachelor’s degree in Computer Science or equivalent experience
  • Experience writing automation tools & eagerness to "automate all the things"

Nice to haves:

  • Experience building large applications from scratch, complete with CI/CD infrastructure
  • Experience in networking, security, hardware or OS performance tuning
  • Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
  • Experience managing Kubernetes clusters or some other container orchestration infrastructure
  • Experience with observability of large scale distributed systems

What's in it for you

Generous compensation package (top-range salary: we pay in the top 95% percentile and our package includes equity and generous benefits). Opportunities to learn on the job (time to upskill in new technologies). High level of independence in your day-to-day work.

#J-18808-Ljbffr

  • Old Toronto, Canada Lloyds Banking Group Full time

    Job Description - Senior Site Reliability EngineerJOB TITLE: Senior Site Reliability Engineer (SRE)LOCATION: Halifax, Leeds or ManchesterHOURS: Full-timeWORKING PATTERN: Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at one of our office sites.Who are Lloyds Banking Group and where does this role sit?If you...


  • Old Toronto, Canada Lloyds Banking Group Full time

    Job Description - Senior Site Reliability EngineerJOB TITLE: Senior Site Reliability Engineer (SRE)LOCATION: Halifax, Leeds or ManchesterHOURS: Full-timeWORKING PATTERN: Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at one of our office sites.Who are Lloyds Banking Group and where does this role sit?If you...


  • Old Toronto, Canada Practice Better Full time

    About us:Practice Better is a leading all-in-one practice management software solution transforming how health & wellness professionals run their practices and support their clients. The company serves 15,000+ customers in over 70+ countries across the globe, and processes hundreds of millions annually in payments on behalf of customers. Over 65% of growth...


  • Old Toronto, Canada Practice Better Full time

    About us:Practice Better is a leading all-in-one practice management software solution transforming how health & wellness professionals run their practices and support their clients. The company serves 15,000+ customers in over 70+ countries across the globe, and processes hundreds of millions annually in payments on behalf of customers. Over 65% of growth...


  • Old Toronto, Canada Manulife Insurance Malaysia Full time

    Senior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Postuler locations Waterloo, Ontario Toronto, siège social mondial (200 Bloor) time type Temps plein posted on Publié hier job requisition id JR24020202 Nous sommes un fournisseur de services financiers qui s’emploie à faciliter les...


  • Old Toronto, Canada Manulife Insurance Malaysia Full time

    Senior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Postuler locations Waterloo, Ontario Toronto, siège social mondial (200 Bloor) time type Temps plein posted on Publié hier job requisition id JR24020202 Nous sommes un fournisseur de services financiers qui s’emploie à faciliter les...


  • Old Toronto, Canada Practice Better Full time

    About the Position: Job Title: Senior Site Reliability Engineer Location: The candidate must be located in Canada or the USA. Our office is in Toronto, ON, Canada, but the role is remote/hybrid/flexible. Reports to: VP, Technology Position Overview: We are on a mission to build an industry-leading product on a strong foundation built by a world-class...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Jobber Full time

    Jobber exists to help people in small businesses be successful. We work with small home service businesses, like your local plumbers, painters, and landscapers, to transform the way service is delivered through technology. With Jobber they can quote, schedule, invoice, and collect payments from their customers, while providing an easy and professional...


  • Old Toronto, Canada Jobber Full time

    Jobber exists to help people in small businesses be successful. We work with small home service businesses, like your local plumbers, painters, and landscapers, to transform the way service is delivered through technology. With Jobber they can quote, schedule, invoice, and collect payments from their customers, while providing an easy and professional...


  • Old Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to Recovery...


  • Old Toronto, Ontario, Canada Akamai Full time

    Are you intrigued by planetary scale, distributed, intelligent systems? Do you like collaborating across teams to solve complex problems? Join our highly skilled Site Reliability Engineering team. Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We do this while maintaining Akamai's...


  • Old Toronto, Canada Sentry Full time

    Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 90,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...


  • Old Toronto, Ontario, Canada Sentry Full time

    Bad software is everywhere, and we're tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 90,000 organizations that believe we're on to something, we're building performance and error monitoring tools that help companies like Disney,...


  • Old Toronto, Canada Sentry Full time

    Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 90,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Ontario, Canada CB Canada Full time

    Site Reliability EngineerOn behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer.Site Reliability Engineer – Job DescriptionAzure cloudJira and confluenceCICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure Kubernetes...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Ontario, Canada Akamai Full time

    Are you passionate about cutting edge technology? Do solving some of the Internet's most difficult content delivery challenges interest you? Join our Compute Site Reliability team Our team is responsible for monitoring and measuring the reliability of our suite of Compute products and platform. In collaboration with Engineering and Product teams, we focus on...