Lead Site Reliability Engineer

4 weeks ago


Old Toronto, Ontario, Canada Magic Leap - Multiple Locations Full time

Transforming the Future of Computing

Magic Leap stands at the forefront of spatial computing, innovating advanced augmented reality solutions that integrate digital elements with the physical environment. As a leader in the next generation of computing platforms, our mixed reality devices open up new avenues for interaction and engagement with the world around us.

If you seek a collaborative environment where every team member is empowered to contribute meaningfully, Magic Leap is the ideal place for you. Here, you will have the opportunity to explore cutting-edge solutions and leverage your skills to address real-world challenges. Working alongside seasoned professionals, you will enhance your expertise and expand human capabilities at the convergence of the tangible and digital realms. We believe that diverse perspectives are essential for driving successful change and innovation. As we shape our future, every voice is valued. Together, we can redefine the workplace and, in partnership with our bold collaborators, achieve the extraordinary.

Your Role

The Product Development and Engineering division is integral to Magic Leap's groundbreaking AR innovations. These highly interactive teams are tasked with our organization's engineering and technical functions, engaging in advanced and intricate scientific research and development that evolves as we create new AR products, features, and marketplaces. Responsibilities include conducting research, design, development, and implementation efforts to enhance Magic Leap's AR offerings; designing, building, testing, and operating systems; ensuring adherence to quality, cost, safety, reliability, timeliness, and performance standards in production; interpreting technical plans and specifications; and collaborating across all teams to ensure our AR products and features deliver a superior, immersive experience that meets and exceeds customer expectations and business goals.

As a Lead Software Engineer within our Site Reliability team, your mission will be to guarantee the secure, efficient, and dependable delivery of services to our clientele. This position merges software and systems engineering to create highly scalable, distributed, and fault-tolerant systems.

Key Responsibilities

  • Design and implement solutions to enhance service reliability through automation and process optimization.
  • Develop and maintain tools and systems that facilitate software deployment across the enterprise.
  • Stay updated on the latest tools and methodologies; engage in educational opportunities; read industry publications; maintain professional networks; and participate in relevant organizations.
  • Work collaboratively with cross-functional teams and external partners to design and develop software features for the Magic Leap device.
  • Diagnose, test, resolve, and deploy complex technical challenges.
  • Drive the enhancement of processes, systems, or products to improve the software development lifecycle.
  • Provide mentorship and guidance to junior software engineers, technicians, and interns, and conduct regular code evaluations.

Qualifications

  • 5+ years of professional experience in software development.
  • Bachelor's Degree in Computer Science, Software Engineering, or a related field. We value your overall experience and professional accomplishments.
  • Strong foundation in Linux-based systems, including proficiency with commands such as SSH, grep, sed, awk, find, etc.
  • Comprehensive understanding of networking and core Internet protocols (e.g., TCP/IP, DNS, TLS, HTTP, gRPC).
  • Proficient in modern programming languages such as Go, Python, etc.
  • Solid grasp of shell scripting and the ability to write clean, maintainable, and efficient scripts.
  • Extensive experience with Kubernetes in production settings.
  • Familiarity with public cloud services (AWS, Google Cloud Platform, etc.).
  • Experience in building and optimizing container images.
  • Comfortable with frequent, incremental code testing and deployment.
  • Strong understanding of automation tools (Terraform, Gitlab CI, ArgoCD, etc.).
  • Ability to maintain composure and focus during high-pressure incidents.
  • Effective communication skills for interacting with geographically distributed cross-functional teams.

Additional Information

  • Your information will be kept confidential in accordance with Equal Employment Opportunities guidelines.

Accommodations

If you require accommodations during the application, interview, or hiring process, please reach out to us. Magic Leap is committed to reasonably accommodating qualified individuals with disabilities as required by applicable law.

#LI-Remote

#LI-CP1

#J-18808-Ljbffr



  • Toronto, Ontario, Canada Lightspeed Restaurant Full time

    Lead Site Reliability Engineer at Lightspeed RestaurantWe are seeking a skilled Lead Site Reliability Engineer to become a vital part of our Lightspeed Restaurant team. Our mission is to create innovative software solutions that empower restaurants to enhance their operational efficiency and profitability.In the role of Lead Site Reliability Engineer, you...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to execute essential tasks that drive business success through the PagerDuty Operations Cloud.We are looking for a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a significant role in developing, sustaining, and enhancing the Kubernetes infrastructure that...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to drive essential operations that propel business growth through the PagerDuty Operations Cloud.We are in search of a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a crucial role in developing, sustaining, and enhancing the Kubernetes infrastructure that...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to perform essential tasks that drive business success through the PagerDuty Operations Cloud.We are in search of a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a crucial role in developing, sustaining, and scaling the Kubernetes infrastructure that...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to join our team as a Lead Site Reliability Engineer at Thomson Reuters. As a key member of our engineering team, you will be responsible for leading and mentoring a team of SREs, providing technical guidance, coaching, and support to foster a culture of collaboration, innovation, and continuous improvement.Key...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to join our team as a Lead Site Reliability Engineer at Thomson Reuters. As a key member of our engineering team, you will be responsible for leading and mentoring a team of SREs, providing technical guidance, coaching, and support to foster a culture of collaboration, innovation, and continuous improvement.Key...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, while also providing mentorship and guidance to junior...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the development and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, as well as mentoring and guiding junior SREs. Key...


  • Old Toronto, Ontario, Canada Moneris Full time

    Your Moneris Career - The Opportunity Moneris stands as a leader in payment processing, recognized as Canada's foremost provider and one of the largest in North America. Connect. Impact. Grow. Become part of one of Canada's esteemed employers and leave your mark at Moneris. The Senior Site Reliability Engineer at Moneris works in collaboration with various...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing service reliability solutions and processes of increasing intricacy, along with mentoring and guiding junior...


  • Old Toronto, Ontario, Canada SoundHound Inc Full time

    About SoundHound AI: At SoundHound AI, we envision a world where every individual can seamlessly interact with technology through natural conversation. Our innovative Voice AI solutions cater to various sectors, including automotive and food services, empowering brands to connect with their audiences in meaningful ways.Role Overview: We are seeking a...


  • Toronto, Ontario, Canada CIRCLE Full time

    About Circle: Circle is a pioneering financial technology firm positioned at the forefront of the evolving digital economy, where value can traverse globally, almost instantaneously, and at a lower cost compared to traditional settlement systems. This innovative layer of the internet unveils extraordinary opportunities for transactions, commerce, and...


  • Old Toronto, Ontario, Canada SoundHound Inc Full time

    About SoundHound AISoundHound AI is dedicated to enabling seamless interactions between individuals and technology through natural language. Our innovative Voice AI solutions cater to diverse applications, including automotive systems and restaurant services, empowering brands to engage with their customers in meaningful ways.Role OverviewThis position...


  • Old Toronto, Ontario, Canada SoundHound Inc Full time

    About SoundHound AISoundHound AI is dedicated to enabling seamless interaction with technology through natural language. Our innovative Voice AI solutions cater to various industries, enhancing user experiences and brand engagement.Role OverviewAs a vital member of our Site Reliability Engineering (SRE) team, you will be instrumental in developing robust...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Join Northbridge Financial Corporation as a Site Reliability Engineering LeadThe Site Reliability Engineering Lead is essential in maintaining the dependability, efficiency, and accessibility of our primary insurance systems. Collaborating closely with both application and infrastructure teams, your focus will be on preventing incidents, managing...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to lead a team of Site Reliability Engineers (SREs) at Thomson Reuters, a leading provider of news, information, and technology solutions to professionals in the legal, tax, accounting, and compliance markets.Key ResponsibilitiesTeam Leadership: Lead and mentor a team of SREs, providing technical guidance,...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to lead a team of Site Reliability Engineers (SREs) at Thomson Reuters, a leading provider of news, information, and technology solutions to professionals in the legal, tax, accounting, and compliance markets.Key ResponsibilitiesTeam Leadership: Lead and mentor a team of SREs, providing technical guidance,...


  • Old Toronto, Ontario, Canada Akamai Full time

    Are you passionate about technology and teamwork? If you enjoy collaborating with diverse teams to tackle intricate challenges, consider joining our esteemed Nameserver SRE team.The Nameserver SRE team plays a pivotal role in defining, measuring, and optimizing the key performance indicators of Akamai's nameserver platform. We adopt a comprehensive approach...


  • Toronto, Ontario, Canada Relay Financial Full time

    About Relay Financial:At Relay, we are revolutionizing the way businesses manage their finances. Traditional banking has often hindered growth for business owners, and we are committed to changing that narrative. Our platform is designed to be an all-in-one, collaborative solution for money management, tailored specifically for small to medium-sized...


  • Toronto, Ontario, Canada Alliancesrcare Full time

    About the RoleAt Alliancesrcare, we are transforming the landscape of financial services by offering a comprehensive platform for small to medium-sized businesses. We are in search of a Lead Site Reliability Engineer to become a pivotal member of our Trust team and contribute to the evolution of our services.Key ResponsibilitiesOversee and manage production...