Site Reliability System Admin
2 days ago
Do you have a passion for invention and self-challenge? Do you grow with pushing the limits of what’s considered feasible? At Hewlett Packard Enterprise, you will have the power to make the most out of your career. Hewlett Packard Enterprise is one of the world’s largest and most successful IT Companies. We are successful not just because of the technology solutions that we deliver, but also because of our core values and the amazing people that we have. We invest in our employees’ personal growth & development in an environment that will challenge and reward them. Hewlett Packard Enterprise is filled with energetic people, sparking technology revolutions and creating the future to help improve the lives of every customer.
HPE is seeking a System Administrator to design, test and administer systems in support of the Supercomputing as a Service (SCaaS) business. This is an exciting opportunity to have a significant impact on a key business with considerable growth potential. In this role, you will have a great deal of creative freedom to define and develop solutions that will support a scaling customer base.
**This role will be performed onsite at the data center in Quebec City, Canada.**
**There will be weekend/off-hours on-call rotation for this position.**
**Primary Responsibilities**
- Ensure continuous uptime of HPC systems at large scale
- Provide system administration for our groundbreaking Supercomputing-as-a-Service system
- Creation of scripting and infrastructure as code to automate the support of cloud infrastructures and HPC-as-a-Service clusters
- Brings technical thinking to break down complex data and to engineer new ideas and methods for solving, prototyping, designing, and implementing cloud-based solutions
- Help design and implement security aspects of the computing infrastructure
- Administration of cloud based HPC systems
- Collaborates with project managers and development partners to ensure effective and efficient delivery, deployment, operation, monitoring, and support of HPC engagements
**Experience and Skills**
- Experience in Linux systems administration, planning, and maintenance
- An understanding of high-speed networks
- An understanding of the security concerns in a cloud environment
- Hands-on experience with **Linux administration** at scale
- Good communication skills
- Hands on experience with the tools and infrastructure to support **HPC systems **at scale including networking and storage
- An understanding of **high-performance computing**
- Proficient in the use and operation of **Linux-based environments** including shells, system configuration and administrative skills.
- Prior experience with large-scale clustered systems (preferably HPC experience with parallel compute systems)
- 5+ years of experience
- BS in Computer Science, IT Management, or equivalent
Join us and make your mark
**We offer**:
- A competitive salary and extensive social benefits
- Diverse and dynamic work environment
- Work-life balance and support for career development
- An amazing life inside the element Want to know more about it?
Then let’s stay connected
**Administrateur du système de fiabilité du site**
Vous avez une passion pour l'invention et l'auto-défi ? Évoluez-vous en repoussant les limites de ce qui est considéré comme faisable ? Chez Hewlett Packard Enterprise, vous aurez le pouvoir de tirer le meilleur parti de votre carrière. Hewlett Packard Enterprise est l'une des sociétés informatiques les plus importantes et les plus prospères au monde. Nous réussissons non seulement grâce aux solutions technologiques que nous proposons, mais également grâce à nos valeurs fondamentales et aux personnes formidables que nous avons. Nous investissons dans la croissance et le développement personnels de nos employés dans un environnement qui les mettra au défi et les récompensera. Hewlett Packard Enterprise est rempli de personnes énergiques, déclenchant des révolutions technologiques et créant l'avenir pour aider à améliorer la vie de chaque client.
HPE recherche un administrateur système pour concevoir, tester et administrer des systèmes à l'appui de l'activité Supercomputing as a Service (SCaaS). Il s'agit d'une opportunité passionnante d'avoir un impact significatif sur une entreprise clé avec un potentiel de croissance considérable. Dans ce rôle, vous aurez une grande liberté de création pour définir et développer des solutions qui prendront en charge une base de clients évolutive.
**Ce rôle sera exécuté sur place au centre de données de la ville du Québec.**
**Il y aura une rotation sur appel en fin de semaine / hors des heures régulières de travail pour ce poste.**
**Les responsabilités**
- Garantir une disponibilité continue des systèmes HPC à grande échelle
- Assurer l'administration du système pour notre système révolutionnaire Supercomputing-as-a-Service
- Création de scripts et d'infrastructure as code pou
-
Site Reliability Engineer
3 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeJob Description Level 4 Overview The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role requires delivering a range of SRE...
-
Site Reliability Engineer
3 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeJob Description Level 4 Overview The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role requires delivering a range of SRE...
-
Site Reliability Engineer
4 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeJob Description Level 4 Overview The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role requires delivering a range of SRE...
-
Site Reliability Engineer
3 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeJob Description:Technology/Role/Department at our Company Enterprise Technology & Services (ETS) delivers shared technology services for the Firm supporting all business applications and end users. ETS provides capabilities for all stages of the Firm’s software development lifecycle, enabling productive coding, functional and integration testing,...
-
Senior Site Reliability Engineer
3 weeks ago
Quebec, Canada Orion Innovation Full timeThe Sr. SRE will be responsible for the reliability, scalability, and performance of systems supporting classified government projects in an air-gapped deployment. This role leverages advanced monitoring and DevOps tools to ensure uptime and compliance in a disconnected environment.Key ResponsibilitiesDesign and maintain highly reliable systems using RKE2,...
-
Site Reliability Engineer
3 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeJob Description: Technology/Role/Department at our Company Enterprise Technology & Services (ETS) delivers shared technology services for the Firm supporting all business applications and end users. ETS provides capabilities for all stages of the Firm’s software development lifecycle, enabling productive coding, functional and integration testing,...
-
Site Reliability Engineer
3 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeJob Description: Technology/Role/Department at our Company Enterprise Technology & Services (ETS) delivers shared technology services for the Firm supporting all business applications and end users. ETS provides capabilities for all stages of the Firm’s software development lifecycle, enabling productive coding, functional and integration testing,...
-
Senior Site Reliability Engineer
1 week ago
Quebec, Canada Orion Innovation Full timeThe Sr. SRE will be responsible for the reliability, scalability, and performance of systems supporting classified government projects in an air-gapped deployment. This role leverages advanced monitoring and DevOps tools to ensure uptime and compliance in a disconnected environment. Key Responsibilities Design and maintain highly reliable systems using RKE2,...
-
Site Reliability Expert
3 weeks ago
Quebec, Canada La Maison Simons Full timeJoin to apply for the Site Reliability Expert (SRE) role at La Maison Simons Are you looking to join our Information Technology team in a unique role that contributes to the optimal maintenance of our production environment? Join the Simons family as a Site Reliability Engineer (SRE). The person in this role plays a key part in ensuring the smooth operation...
-
Site Reliability Engineer
3 weeks ago
Quebec, Canada ALLTECH CONSULTING SVC INC Full timeA leading consulting service company is seeking a Site Reliability Engineer to enhance operational support and reliability engineering for critical products. The role aims to maximize developer productivity by implementing effective systems and processes within a broad development environment. Candidates who may be transitioning from software development are...