Specialist Site Reliability Engineer
2 days ago
About the job Specialist Site Reliability Engineer (#11072) The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the development phase and the analysis of field RAM data to determine solution RAM KPIs and to drive corrective action programs. With the advent of Cloud Computing there is also a need for a RAM specialist that is well versed in Cloud based technologies as well as solution architectures for the cloud. Separate specializations may exist for hardware and software RAM. The technologies used are primarily distributed digital control systems, communication networks, Global Navigation Satellite Systems (GNSS), embedded and virtualized computing as well as Cloud based solutions. Main Responsibilities Solution RAM Assessments Review and approve solution requirements for RAM Determine non-functional requirements and targets for RAM performance Perform analysis and modeling to predict RAM behaviour Adhere to the I&T Development Process Solution RAM Field Performance Assign requirements to solutions and products to ensure they support the ability to measure RAM Key Performance Indicators (KPIs) Use the field performance measurement to identify key contributors and drive corrective action plans when necessary Review vendor specifications, test results, analysis artifacts Participate in failure review board for selected vendors Review corrective action plans from the vendors Drive to completion the vendor corrective action plans Use the field performance measurement to identify key contributors and drive corrective action plans when necessary Requirements Experience Minimum 5-10 years overall work experience Minimum 5 years experience in RAM engineering for complex systems, or 7 years experience in product development for high reliability/availability, or safety critical systems with accountability for product field performance Skills/Knowledge Knowledge of hardware and/or software design and development practices and processes with focus on high reliability and high availability applications Knowledge of RAM analysis techniques such as failure rate prediction, Reliability Block Diagrams (RBD), Markov models, Monte Carlo methods, Failure Modes Effects Analysis (FMEA), Fault Tree Analysis (FTA) Analysis of reliability and failure field data, statistical estimation, Root Cause Analysis (RCA) Critical thinking and judgement Ability to assimilate new information quickly and apply to the assignment Ability to deliver with autonomy Organizing work to support multiple projects in parallel Knowledge and/or experience in the following areas Multi-Cloud/Multi-Zone-Based designs with High Availability (HA) Compute Infrastructure: Google Compute Engine (GCE) (servers, databases, firewalls, load balancers, networking and storage) Services for Google Cloud Platform (GCP) Databases including NoSQL Databases, Big Data technologies (Oracle, SQL Server, Postgres, Spark, Hadoop, Cloud databases) Application development concepts and technologies (CI/CD, Java, Python) Education/Certification/Designation Bachelors degree in Electrical Engineering, Mechanical Engineering, Computer Science, Computer Engineering or equivalent degree & experience Assets Knowledge of product design and standards for the rail industry Knowledge of rail industry or other transportation industry operations Working Conditions This role may require occasional business travel within North America in accordance with company policy #J-18808-Ljbffr
-
Specialist Site Reliability Engineer
2 days ago
Montreal (administrative region), Canada Global Talent Alliance, Canada Full timeAbout the job Specialist Site Reliability Engineer(#11072)The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis,...
-
Site Reliability Engineer
2 days ago
Montreal (administrative region), Canada Canonical Full timeSite Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...
-
Site Reliability Engineer
4 days ago
Montreal (administrative region), Canada Canonical Full timeSite Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...
-
Site Reliability Engineer
4 days ago
Montreal (administrative region), Canada Canonical Full timeSite Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...
-
Senior Site Reliability Engineering Specialist
3 weeks ago
Montreal (administrative region), Canada PowerToFly Full timeWe're seeking someone to join our EC Modern Infra Platforms team as a Senior Site Reliability Engineering Specialist in Enterprise Computing to lead SRE optimization effort across multiple infrastructure teams Modern Container Platforms at Morgan Stanley across On Prem and Public Cloud environment to drive ongoing optimization & automation. In the Technology...
-
Site Reliability Engineer
4 days ago
Montreal (administrative region), Canada High Tech Genesis Full timeJoin to apply for the Site Reliability Engineer role at High Tech Genesis WE'RE HIRING! At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is among the companies that lead the world in technology and innovation. Your next chapter starts here. Responsibilities...
-
Site Reliability Engineer
4 days ago
Montreal (administrative region), Canada High Tech Genesis Full timeJoin to apply for the Site Reliability Engineer role at High Tech Genesis WE'RE HIRING! At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is among the companies that lead the world in technology and innovation. Your next chapter starts here. Responsibilities...
-
Site Reliability Engineer
4 days ago
Montreal (administrative region), Canada High Tech Genesis Full timeJoin to apply for the Site Reliability Engineer role at High Tech Genesis WE'RE HIRING! At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is among the companies that lead the world in technology and innovation. Your next chapter starts here. Responsibilities...
-
Specialist Site Reliability Engineer
2 weeks ago
Montreal, Quebec, Canada Global Talent Alliance, Canada Full time(#11072)The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the...
-
Site Reliability Engineer
1 week ago
Montreal, Canada Open Systems Technologies Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure Location: Montreal – Hybrid – 3 days/week The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s ServiceNow SaaS implementation. Reporting to a Site...