Specialist Site Reliability Engineer
1 day ago
(#11072)
The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the development phase and the analysis of field RAM data to determine solution RAM KPIs and to drive corrective action programs. With the advent of Cloud Computing there is also a need for a RAM specialist that is well versed in Cloud based technologies as well as solution architectures for the cloud.
Separate specializations may exist for hardware and software RAM. The technologies used are primarily distributed digital control systems, communication networks, Global Navigation Satellite Systems (GNSS), embedded and virtualized computing as well as Cloud based solutions.
Open position in Montreal, QC; Toronto, ON; Vancouver, BC; Calgary, AB; Edmonton, AB.
Main Responsibilities
Solution RAM Assessments
· Review and approve solution requirements for RAM
· Determine non-functional requirements and targets for RAM performance
· Perform analysis and modeling to predict RAM behaviour
· Adhere to the I&T Development Process
Solution RAM Field Performance
· Assign requirements to solutions and products to ensure they support the ability to measure RAM Key Performance Indicators (KPIs)
· Use the field performance measurement to identify key contributors and drive corrective action plans when necessary
Vendor Product RAM Assessments
· Review vendor specifications, test results, analysis artifacts
· Participate in failure review board for selected vendors
· Review corrective action plans from the vendors
· Drive to completion the vendor corrective action plans
· Use the field performance measurement to identify key contributors and drive corrective action plans when necessary
Requirements
Experience
· Minimum 5-10 years overall work experience
· Minimum 5 years experience in RAM engineering for complex systems, or 7 years experience in product development for high reliability/availability, or safety critical systems with accountability for product field performance
Skills/Knowledge
Knowledge of hardware and/or software design and development practices and processes with focus on high reliability and high availability applications
· Knowledge of RAM analysis techniques such as failure rate prediction, Reliability Block Diagrams (RBD), Markov models, Monte Carlo methods, Failure Modes Effects Analysis (FMEA), Fault Tree Analysis (FTA)
· Analysis of reliability and failure field data, statistical estimation, Root Cause Analysis (RCA)
· Critical thinking and judgement
· Ability to assimilate new information quickly and apply to the assignment
· Ability to deliver with autonomy
· Organizing work to support multiple projects in parallel
Knowledge and/or experience in the following areas:
· Multi-Cloud/Multi-Zone-Based designs with High Availability (HA)
· Containers: Docker
· Container orchestration: Google Kubernetes Engine (GKE),
· Compute Infrastructure: Google Compute Engine (GCE) (servers, databases, firewalls, load balancers, networking and storage)
· Services for Google Cloud Platform (GCP)
· Databases including NoSQL Databases, Big Data technologies (Oracle, SQL Server, Postgres, Spark, Hadoop, Cloud databases)
· Application development concepts and technologies (CI/CD, Java, Python)
Education/Certification/Designation
· Bachelors degree in Electrical Engineering, Mechanical Engineering, Computer Science, Computer Engineering or equivalent degree & experience
Assets
· Knowledge of product design and standards for the rail industry
· Knowledge of rail industry or other transportation industry operations
Working Conditions
This role may require occasional business travel within North America in accordance with company policy
-
Site Reliability Engineer
1 day ago
Montreal, Quebec, Canada Axelon Services Corporation Full time $80,000 - $120,000 per yearJob Title:Site Reliability Engineer (SRE) - ServiceNow / Application InfrastructureExperience Level:Level 4 (advanced): 7-15 yearsLocation: Montreal (Day 1 onboarding onsite / in office presence 3x week)Contract Duration:12 Months ContractSkills Required:At least one of: Software development skills in one or more programming languages, e.g. Python,...
-
Site Reliability Engineer
1 week ago
Montreal, Quebec, Canada 6a1ea0be-549a-479d-87f0-a411f11f4fda Full time $110,000 - $150,000 per yearAkur8 is a young, dynamic, fast growing Insurtech scale-up that is transforming insurance pricing and reserving with transparent machine learning.Our SaaS platform leverages the power of transparent machine learning and predictive analytics to inject game-changing speed, performance and reliability into insurers' pricing and reserving processes.Powered by...
-
Site Reliability Engineer, LUS
2 weeks ago
Montreal, Quebec, Canada Lyft Full time $88,000 - $110,000 per yearAt Lyft, our purpose is to serve and connect. We aim to achieve this by cultivating a work environment where all team members belong and have the opportunity to thrive.As a leader in micromobility, Lyft powers millions of rides daily across over 200 cities with our cutting-edge ride-sharing, bike-sharing, and scooter-sharing technologies. Our Montreal office...
-
Site Reliability Engineer, LUS
6 days ago
Montreal, Quebec, Canada Lyft Full time $72,000 - $110,000 per yearAt Lyft, our purpose is to serve and connect. We aim to achieve this by cultivating a work environment where all team members belong and have the opportunity to thrive.As a leader in micromobility, Lyft powers millions of rides daily across over 200 cities with our cutting-edge ride-sharing, bike-sharing, and scooter-sharing technologies. Our Montreal office...
-
Site Reliability Engineer/ServiceNow SaaS
7 days ago
Montreal, Quebec, Canada NTT DATA Full time $80,000 - $120,000 per yearNTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Site Reliability Engineer/ServiceNow SaaS (Onsite Hybrid) to join our team in Montreal, Quebec (CA-QC), Canada (CA). Job...
-
Site Reliability Engineer w/Python
5 days ago
Montreal, Quebec, Canada NTT DATA, Inc. Full time $80,000 - $120,000 per yearNTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.We are currently seeking a Site Reliability Engineer w/Python (Onsite Hybrid) to join our team in Montreal, Quebec (CA-QC), Canada (CA).Job Responsibilities...
-
Field Service Specialist: Engineering
2 weeks ago
Montreal, Quebec, Canada Reivax North America inc. Full time $70,000 - $85,000 per yearOPEN POSITION We are hiring a full-timeField Service SpecialistReivax North Americais a dynamic and innovative leader in the power generation control industry. With over 30 years of experience, we specialize in designing, manufacturing and commissioning tailored solutions for synchronous machines and we are committed to delivering high-performance excitation...
-
Montreal, Quebec, Canada National Bank of Canada Full time $120,000 - $180,000 per yearA career as a Senior Developer in Site reliability Engineering (SRE) and Artificial Intelligence in the Cards and Credit risk API Platform team at National Bank means acting as an expert·in systems resilience and generative artificial intelligence integration. This position allows you to have a concrete impact on the performance and availability of our it...
-
Help Desk Specialist – On-Site
14 hours ago
Montreal, Quebec, Canada IO Solutions Contact Center Morocco Full time $45,000 - $65,000 per yearHelp Desk Specialist – On-siteJob DescriptionIO Solutions is looking for a Helpdesk Specialist to work on-site at our Montreal office. The person will work closely with the IT team to ensure the proper operation, maintenance, and optimization of the organization's infrastructure, including the management of computer equipment being returned by or shipped...
-
Hardware Reliability Engineer
24 hours ago
Montreal, Quebec, Canada Lyft Full time $72,000 - $90,000At Lyft, our purpose is to serve and connect. We aim to achieve this by cultivating a work environment where all team members belong and have the opportunity to thrive.Responsibilities:Ensure product quality and reliability for sustaining products already deployed in the field as well as New Product Introduction (NPI) productsPrepare concise and detailed...