Site Reliability Expert

1 week ago


Montreal, Quebec, Canada PlayStation Full time

Why PlayStation?

PlayStation isn't just the Best Place to Play — it's also the Best Place to Work. Today, we're recognized as a global leader in entertainment producing The PlayStation family of products and services including PlayStation5, PlayStation4, PlayStationVR, PlayStationPlus, acclaimed PlayStation software titles from PlayStation Studios, and more.

PlayStation also strives to create an inclusive environment that empowers employees and embraces diversity. We welcome and encourage everyone who has a passion and curiosity for innovation, technology, and play to explore our open positions and join our growing global team.

The PlayStation brand falls under Sony Interactive Entertainment, a wholly-owned subsidiary of Sony Corporation.

//FRENCH FOLLOWS//

LOCATION: QUEBEC

In May 2021, we embarked on a journey to start with a small team and big ambitions. Our goal was to build a studio where we could make the kind of games we've always wanted to create – and games we've longed to play.

We've made amazing progress in a short time thanks to our talented, passionate team and their exceptional contributions. We established a culture at Haven grounded in kindness, adaptability and courage that unlocks creativity. Our first new IP for PlayStation is on track to deliver a AAA multiplayer experience with a vision to build a systemic and evolving world focused on freedom, thrill, and playfulness that will keep players entertained and engaged for years.

Haven joined in 2022, and we are on track to build an exclusive new IP for Playstation and grow the first Sony game development team in Canada.

We are looking for an experienced Site Reliability Expert (SRE) to join the Haven Entertainment Studios project team. The Site Reliability Expert will work closely with the online, ML and telemetry teams to build and manage the AWS online infrastructures. The successful candidate will also help with our GCP infrastructure.

The Site Reliability Expert will report to the Senior Online Technical Producer.

Responsibilities and Duties

Define, implement and deploy online services and game servers cloud infrastructure in collaboration with stakeholders. Participate and contribute to sprints. Assess and size effort associated with work backlog and participate in grooming. Participate in incident response and on-call rotation. Secure and mitigate vulnerabilities in collaboration with our security team. Integrating with internal and external observability systems. Support complex multi region workloads. Define and improve automation and tools. Implement scalable, secure and easily maintainable solutions. Communicate with other departments to understand their needs, requirements and align on Cloud best practices. Work with systems and software engineers to develop and document requirements and functional specifications Define CI/CD pipelines. Perform code deployments. Communicate effectively with team members, production and management to ensure that project goals and deadlines are met.

Minimum Qualifications

Professional experience with Amazon Web Services (AWS). Understanding of public cloud infrastructure best practices. Ability to accept feedback and adapt to change. Resourcefulness in problem-solving. Self-driven, dedicated to advancing your craft, and eager to learn new techniques and software. Experience implementing the CI/CD and observability toolchain. Hands-on experience in different domains, like observability, service mesh, networking, load balancers, database architecture, advanced analytics. Substantial knowledge in Linux administration Expertise with infrastructure as code and configuration management tooling and techniques (Terraform, Ansible). Experience with Docker and Kubernetes. Proficiency with version control systems (Perforce and Git). Ability to quickly identify and resolve issues in distributed systems.

Bonus Qualifications

Understanding of Games as a Service technical requirements. Expertise with Google Cloud Platform (GCP) is a plus. Knowledge of the Rust and TypeScript programming languages is a plus. Administration of Github CI/CD pipelines is a plus. AWS and GCP certifications or any related additional professional certifications is a plus. Bilingual in French and English is a plus.

En mai 2021, nous sommes lancés dans la création de Haven Studios avec une petite équipe et de grandes ambitions. Notre objectif était de construire un studio où nous pourrions créer le genre de jeux que nous avons toujours voulu créer - et auxquels nous avons toujours rêvé de jouer Nous faisons désormais partie de la famille PlayStations Studios et nous sommes fiers d'avoir l'opportunité de créer une nouvelle propriété intellectuelle exclusive pour Playstation et de former la première équipe de développement Sony au Canada.

Nous avons fait des progrès étonnants en peu de temps grâce à notre équipe talentueuse et passionnée et à ses contributions exceptionnelles. Nous avons instauré chez Haven une culture fondée sur la bienveillance, l'adaptabilité et le courage qui libère la créativité. Notre première nouvelle propriété intellectuelle pour PlayStation est sur la bonne voie pour offrir une expérience multijoueur de niveau AAA, avec la vision de construire un monde systémique et évolutif axé sur la liberté, le frisson et le jeu, qui gardera les joueur·euse·s diverti·e·s et engagé·e·s pendant des années.

Haven a rejoint la famille des studios PlayStations en 2022, et nous sommes sur la bonne voie pour créer une nouvelle propriété intellectuelle exclusive pour Playstation et continuer de bâtir la première équipe de développement de jeux Sony au Canada.

Nous recherchons un·e Expert en Fiabilité des Infrastructures expérimenté·e· pour rejoindre notre équipe. Vous travaillerez en étroite collaboration avec les équipes en ligne et de télémétrie pour construire et gérer les infrastructures en ligne AWS et prêter main forte avec GCP.

Le·la Expert en Fiabilité des Infrastructures relèvera du·de la Producteur·rice Technique Senior.

Responsabilités et tâches

Définir, mettre en œuvre et déployer des services en ligne et des serveurs de jeux dans l'infrastructure Cloud en collaboration avec les parties prenantes. Participer et contribuer aux sprints. Évaluer et dimensionner l'effort associé au backlog de travail et participer au grooming. Sécuriser et réduire les vulnérabilités en collaboration avec notre équipe de sécurité. Intégrer les systèmes d'observabilité internes et externes. Prendre en charge des charges de travail complexes et multirégionales. Prendre en charge l'automatisation et les outils. Mettre en œuvre des solutions évolutives, sécurisées et faciles à maintenir. Communiquer avec les autres départements pour comprendre leurs besoins, leurs exigences et s'aligner sur les meilleures pratiques du Cloud. Travailler avec les ingénieur·e·s systèmes et logiciels pour développer et documenter les exigences et les spécifications fonctionnelles. Définir des pipelines CI/CD. Effectuer des déploiements de code. Communiquer efficacement avec les membres de l'équipe, la production et la direction pour s'assurer que les objectifs et les délais du projet sont respectés.

Qualifications recherchées

Expérience professionnelle avec Amazon Web Services (AWS). Connaissance de Google Cloud Platform (GCP). Compréhension des meilleures pratiques en matière d'infrastructure de cloud public. Capacité à accepter le retour d'information et à s'adapter au changement. Faire preuve d'ingéniosité pour résoudre les problèmes. Autonome, déterminé·e à faire progresser votre métier, et disposé·e à apprendre de nouvelles techniques et de nouveaux logiciels. Expérience de la mise en œuvre de la chaîne d'outils CI/CD et de l'observabilité. Expérience pratique dans différents domaines, comme l'observabilité, le maillage de services, le réseau, les équilibreurs de charge, l'architecture de base de données, l'analyse avancée. Connaissance approfondie de l'administration Linux Expertise avec les outils et techniques d'infrastructure en tant que code et de la gestion de la configuration (Terraform, Ansible). Expérience avec Docker et Kubernetes. Maîtrise des systèmes de contrôle de version (Perforce et Git). Capacité à identifier et à résoudre rapidement les problèmes dans les systèmes distribués.

Qualifications idéales

Compréhension des exigences techniques des jeux en tant que service. La connaissance des langages de programmation Rust et TypeScript sont un atout. Expérience en gestion de la plateforme Github est un atout. Certifications AWS ou toute autre certification professionnelle supplémentaire est un atout. Le bilinguisme en français et en anglais est un atout.

  • Montreal, Quebec, Canada Sony Interactive Entertainment Inc. Full time

    //FRENCH FOLLOWS//In May 2021, we embarked on a journey to start Haven Studios with a small team and big ambitions. Our goal was to build a studio where we could make the kind of games we've always wanted to create – and games we've longed to play.We've made amazing progress in a short time thanks to our talented, passionate team and their exceptional...


  • Montreal, Quebec, Canada Sony Playstation Full time

    Why PlayStation?PlayStation isn't just the Best Place to Play — it's also the Best Place to Work. Today, we're recognized as a global leader in entertainment producing The PlayStation family of products and services including PlayStation5, PlayStation4, PlayStationVR, PlayStationPlus, acclaimed PlayStation software titles from PlayStation Studios, and...


  • Montreal, Quebec, Canada Haven Studios Full time

    LOCATION: QUEBECEn mai 2021, nous sommes lancés dans la création de Haven Studios avec une petite équipe et de grandes ambitions. Notre objectif était de construire un studio où nous pourrions créer le genre de jeux que nous avons toujours voulu créer - et auxquels nous avons toujours rêvé de jouer Nous faisons désormais partie de la famille...


  • Montreal, Quebec, Canada Cisco Full time

    ```htmlWho We AreAs a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive multi" - multi-layer, multi-domain, and multi-vendor networks. Accedian's open...


  • Montreal, Quebec, Canada Haven Interactive Studios Full time

    En mai 2021, nous sommes lancés dans la création de Haven Studios avec une petite équipe et de grandes ambitions. Notre objectif était de construire un studio où nous pourrions créer le genre de jeux que nous avons toujours voulu créer - et auxquels nous avons toujours rêvé de jouer Nous faisons désormais partie de la famille PlayStations Studios...


  • Montreal, Quebec, Canada LanceSoft, Inc. Full time

    Job Title: Production Reliability & Support Expert (SRE)Location : Montreal ( Office attendance from Day 1 – Hybrid mode 3x per week)Years of experience : 3 to 5 years Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets prod


  • Montreal, Quebec, Canada Cisco Systems, Inc. Full time

    Cloud and Data Center, Software Development As a part of Cisco, Accedian is a leader in performance analytics and end user experience solutions for service providers and mid-to-large size enterprises. The Accedian Skylight service assurance platform offers granular end-to-end visibility within "the massive multi" - multi-layer, multi-domain, and...


  • Montreal, Quebec, Canada SAP Full time

    We help the world run better Our company culture is focused on helping our employees enable innovation by building breakthroughs together. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly...


  • Montreal, Quebec, Canada Cisco Systems, Inc. Full time

    Site Reliability Engineering - Technical Leader Location: Alternate Location Area of Interest Compensation Range CAD CAD Job Type Professional Cloud and Data Center, Software Development Job Id Who We Are As a part of Cisco, Accedian is a leader in per


  • Montreal, Quebec, Canada Lyft Full time

    At Lyft, our mission is to enhance people's lives with top-notch transportation services. We strive to foster an inclusive and diverse environment in our community, valuing the unique contributions of each team member. Our goal is to revolutionize the way the world approaches transportation, envisioning a future where cities feel more connected and...


  • Montreal, Quebec, Canada Noverka Conseil Full time

    At Noverka, our values illustrate who we are and define our beliefs: Human, Transparent, Passionate. We are driven by innovation and success, both in our relationships and in our practices.Finding the right job for the right person is what we do bestOur client, an organization in the banking industry is looking for a Site Reliability Engineering (SRE)...


  • Montreal, Quebec, Canada National Bank Full time

    As a Site Reliability Specialist, Business Intelligence and Data Management, you will play a key role within a DevOps squad that is working to innovate, develop new ways of integrating data into our assets and maintain the availability and reliability of our assets in production. You will be tasked with helping clients and consumers more easily use the data...


  • Montreal, Quebec, Canada CGI Full time

    Position Description:CGI is a dynamic and innovative technology firm committed to delivering cutting-edge solutions. We are currently seeking a highly skilled and motivated individual to join our team as a FinOps and Site Reliability Engineer (SRE). This role is pivotal in bridging our finance and technology teams to ensure the successful implementation and...


  • Montreal, Quebec, Canada Socotra, Inc. Full time

    At Lyft, our mission is to improve people's lives with the world's best transportation. Imagine cities where streets are safe, communities thrive, and personal cars are a thing of the past. We envision a future where shared and active transportation modes are the norm, fostering vibrant, connected neighborhoods. As a leader in micromobility, Lyft powers...


  • Montreal, Quebec, Canada Lightspeed Full time

    Welcome to NuOrder by LightspeedAre you actively looking for a new opportunity? Or just checking the market? Well... you might just be in the right place We're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America.NuORDER by Lightspeed builds software solutions that help merchants grow the size and the...


  • Montreal, Quebec, Canada Lightspeed Full time

    Hi there Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well... you might just be in the right place We're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...


  • Montreal, Quebec, Canada National Bank Full time

    As a Site Reliability Specialist, Business Intelligence and Data Management, you will play a key role within a DevOps squad that is working to innovate, develop new ways of integrating data into our assets and maintain the availability and reliability of our assets in production. You will be tasked


  • Montreal, Quebec, Canada Lightspeed Full time

    Hi there Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well... you might just be in the right place We're looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the...


  • Montreal, Quebec, Canada Behavox Full time

    About BehavoxBehavox is shaping the future for how businesses harness their most important raw material - data. Our mission is bold: Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world.From managing enterprise risk and compliance to maximizing revenue and value, our...


  • Montreal, Quebec, Canada Behavox Full time

    About BehavoxBehavox is shaping the future for how businesses harness their most important raw material - data. Our mission is bold: Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world.From managing enterprise risk and compliance to maximizing revenue and value, our...