Site Reliability Engineer
1 month ago
Job Description:
We are growing our team globally. It’s a unique opportunity to work on leading edge projects leveraging the latest technologies such as Cloud solutions and Analytics. The primary objective of the team is to ensure reliability across the production plant by developing a deep understanding of how our application code is running, configured, and scaled. This allows us to effectively resolve open incidents in the shortest amount of time, develop monitors to detect future occurrences and implement automation technologies to enable the environment to self-heal. Our team manages all entitlements/accesses in Production in a scope of more than 35 systems and user distributed globally around the world with accesses span from Trading to payment to vendor apps. Role and
Responsibilities:
- Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets production standards
- Incorporate System Reliability Engineering and DevOps implementations into the day-to-day role by developing automated solutions to long standing problems to ensuring minimal downtime and manual effort
- Configuring application monitors using industry standard monitoring tools, as well as developing customized monitoring solutions
- Build extensive business and application knowledge required for supporting client facing applications
- Revisit SRE Metrics and confirm against the firm and department goals
- Implement tooling / create automations to help with Toil Elimination (manual or repetitive work)
- Engage early in SDLC with our Development teams to have an active role in creating a resilient and reliable solution
- Prioritize project work based on critical incidents and key business stakeholders
- Interface with clients and other technology teams to provide governance and control around the production environment.
Qualifications:
You should apply on this requisition if you have, at minimum, the following profile:
- Bachelor’s degree in Computer Science or related field
- Experience with Service Oriented Architecture, Distributed Systems, Business Intelligence Reporting such as Power BI, Scripting such as Python or shell, Front end development (HTML, Java Script, AngularJS), Cloud Computing such as MS AZURE and SaaS integrations
- Clear understanding of Logging, Monitoring, and Knowledge Management practices such as Docs as Code
- Ability to manage an incident call and coordinate multiple teams towards a common goal of resolving a business impactful outage, once trained
- Strong knowledge of DevOps and SRE Principles with grasp over tools / approach to apply them
- Strong infrastructure knowledge in Linux / Unix admin, Storage, Networking and Web Technologies
- Advanced Unix Shell / Python scripting experience
- Advanced SQL query language knowledge such as Sybase, DB2, MongoDB and Snowflake preferred.
-
Site Reliability Engineer
4 days ago
Montreal, Canada Lyft Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. Imagine cities where streets are safe, communities thrive, and personal cars are a thing of the past. We envision a future where shared and active transportation modes are the norm, fostering vibrant, connected neighborhoods.As a leader in micromobility, Lyft powers...
-
Site Reliability Engineer
1 day ago
Montreal, Canada Socotra, Inc. Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. Imagine cities where streets are safe, communities thrive, and personal cars are a thing of the past. We envision a future where shared and active transportation modes are the norm, fostering vibrant, connected neighborhoods. As a leader in micromobility, Lyft powers...
-
Site Reliability Engineer
4 days ago
Montreal, Canada Lyft Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. To create the best transportation experience for all, we start in our own community by creating an open, inclusive, and diverse organization where all team members are recognized for what they bring. We believe that trip by trip, we’re changing the way our world...
-
Junior Site Reliability Engineer
3 weeks ago
Montreal, Canada Plexia Full timeJob DescriptionAs a Junior Site Reliability Engineer (SRE) you will play a crucial role within the R&D and Innovation department. You will be called upon to collaborate with the Plexia product-aligned and core architecture team. The highly sensitive nature of health and medical systems expertise makes it so that the availability and reliability of our...
-
Principal Site Reliability Engineer
3 weeks ago
Montreal, Canada Lightspeed Commerce Full timeHi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...
-
Principal Site Reliability Engineer
2 days ago
Montreal, Canada Lightspeed Full timeWelcome to NuOrder by Lightspeed! Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...
-
Principal Site Reliability Engineer
1 month ago
Montreal, Canada Lightspeed Commerce Full timeHi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...
-
Principal Site Reliability Engineer
3 days ago
Montreal, Canada Lightspeed Full timeWelcome to NuOrder by Lightspeed! Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...
-
Principal Site Reliability Engineer
1 day ago
Montreal, Canada Lightspeed Full timeWelcome to NuOrder by Lightspeed! Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds so
-
Principal Site Reliability Engineer
10 hours ago
Montreal, Canada Lightspeed Full timeHi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds so
-
Principal Site Reliability Engineer
2 weeks ago
Montreal, Canada Lightspeed Full timeHi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...
-
Site Reliability Engineering
7 days ago
Montreal, Canada Cisco Full timeWho We Are As a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive multi" - multi-layer, multi-domain, and multi-vendor networks. Accedian's open and...
-
Site Reliability Engineering
1 day ago
Montreal, Canada Cisco Systems, Inc. Full timeSite Reliability Engineering - Technical Leader Location: Alternate Location Area of Interest Compensation Range 138300 CAD - 203700 CAD Job Type Professional Cloud and Data Center, Software Development Job Id 1421618 Who We Are As a part of Cisco, Accedian is a leader in per
-
Site Reliability Engineer
1 month ago
Montreal, Canada LanceSoft, Inc. Full timeJob Title: Production Reliability & Support Expert (SRE)Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week)Years of experience : 3 to 5 years • Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets production standards • Incorporate System Reliability Engineering...
-
Site Reliability Engineer
1 month ago
Montreal, Canada LanceSoft, Inc. Full timeJob Title: Production Reliability & Support Expert (SRE) Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week) Years of experience : 3 to 5 years • Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets production standards • Incorporate System Reliability...
-
Site Reliability Engineer
1 month ago
Montreal, Canada LanceSoft, Inc. Full timeJob Title: Production Reliability & Support Expert (SRE)Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week)Years of experience : 3 to 5 years • Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets production standards • Incorporate System Reliability Engineering...
-
Site Reliability Engineer
1 week ago
Montreal, Canada LanceSoft, Inc. Full timeResponsibilities include:• SRE duties for the relevant squad, Snowflake or Flexera, providing engineering support for observability and enhancements to the overall functionality of the ITSM platforms. • A commitment to understanding ITSM’s range of products with a view to specializing in one or two of them and contributing to their documentation. •...
-
Site Reliability Engineer
1 week ago
Montreal, Canada LanceSoft, Inc. Full timeResponsibilities include: • SRE duties for the relevant squad, Snowflake or Flexera, providing engineering support for observability and enhancements to the overall functionality of the ITSM platforms. • A commitment to understanding ITSM’s range of products with a view to specializing in one or two of them and contributing to their...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeJob Description:We are growing our team globally. It’s a unique opportunity to work on leading edge projects leveraging the latest technologies such as Cloud solutions and Analytics. The primary objective of the team is to ensure reliability across the production plant by developing a deep understanding of how our application code is running, configured,...
-
Site Reliability Engineer
1 month ago
Montreal, Canada LanceSoft, Inc. Full timeJob Description: We are growing our team globally. It’s a unique opportunity to work on leading edge projects leveraging the latest technologies such as Cloud solutions and Analytics. The primary objective of the team is to ensure reliability across the production plant by developing a deep understanding of how our application code is running, configured,...