Site Reliability Engineer

4 weeks ago


Toronto ON, Canada emagine Consulting Full time

Job Description:

Work model: Remote.

Business trips: Occasional to Copenhagen.

Assignment Type: B2B

Project Length: Long-term

Start Date: ASAP

Project Language: English

About the Role:

A unique opportunity to join as a Site Reliability Engineer to the dynamic, ambitious, and international company where you will work with a lot of skilled colleagues. You will join the dispersed team, with members, that develops a product platform that helps other product teams deliver cloud native functionality in a consistent manner.

Responsibilities:
  1. Define and maintain containers for Kubernetes (both in Azure and local developer environments).
  2. Create Helm charts used for deploying our product in Azure.
  3. Be responsible for our CI/CD processes on GitHub Actions, focusing on quality, efficiency, and automation.
  4. Develop and maintain our authentication and authorization functionality (OpenID Connect and OAuth2).
  5. Be responsible for logging, telemetry, and driving improvements in CI/CD and observability.
  6. Maintain internal deployments used by developers.
  7. Enhance the quality and cadence of release processes.
  8. Collaborate with the development team to improve the deployment platform.
Must have:
  • 5+ years of experience from a similar position working on a SaaS product.
  • Hands-on experience with cloud solutions in production, either as a cloud software developer who has worked on a SaaS solution, or as a cloud-ops engineer who has been responsible for operating a SaaS solution.
  • Hands-on experience with Kubernetes.
  • Experience with logging and tracing tools for effective troubleshooting and debugging.
  • Experience in optimizing system performance, scalability, and efficiency to handle growing workloads.
  • Expertise in incident management, including the ability to diagnose and resolve incidents quickly and efficiently.
  • Knowledge of:
    • Infrastructure as Code principles.
    • Monitoring tools like Prometheus, Grafana, or similar solutions to ensure visibility into system performance and health.
    • Security best practices and the ability to incorporate security considerations into the design and operation of systems.
    • Reliability engineering principles (e.g., Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets).
  • Strong communication skills to effectively collaborate with cross-functional teams, including developers, operations, and other stakeholders.
  • Ability to document processes, procedures, and system architecture comprehensively.
  • Strong analytical and problem-solving skills, with the ability to diagnose complex issues and implement effective solutions.
  • Willingness to adapt to evolving technologies and industry best practices, with a commitment to continuous learning.
We offer:
  • Long-term cooperation.
  • Transparently built relations based on trust and fair play.
  • Medicover card, Multisport card on preferential conditions.
  • Internal reference bonus.
#J-18808-Ljbffr

  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Toronto, Ontario, Canada Zortech Solutions Full time

    Hi,Hope you are doing GreatThis side Priya Rajput from Zortech Solutions trying to reach you for an exciting job opening, kindly have a look to job description and revert me with your positive feedback. My mail ID is or call me on .Role: Site Reliability EngineerLocation: Toronto, ON-OnsiteDuration: Fulltime PermanentSkills and Responsibilities:...


  • Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean...


  • Toronto, ON, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Toronto, ON, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to Recovery...


  • Toronto, Canada eTeam Full time

    Remote work Duration - 4 months - Preference is to find candidates who are willing to be converted to full time employee . The conversion decision will be made based on performance. Job description - ::: Role Desc : Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey Designing for and implementing...


  • Old Toronto, Canada eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...


  • Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability...


  • Toronto, Canada BMO Full time

    Application Deadline: 04/29/2024Address:33 Dundas Street WestThis role is Hybrid (1-2 days per week in the office)The Director - Site Reliability Engineering will lead a team that will work with application teams, infrastructure teams, and business partners to continuously improve the stability, reliability and efficiency of Finance and Enterprise Risk...


  • Old Toronto, Canada Lightspeed Full time

    Hi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...


  • Toronto, ON, Canada Behavox Full time

    Behavox is shaping the future for how businesses harness their most important raw material - data. Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. From managing enterprise risk and compliance to maximizing revenue and value, our data operating platform presents...


  • Old Toronto, Canada Nityo Infotech Full time

    Job Responsibilities: Objectives of this Role Run the IKP clusters by monitoring availability and taking a holistic view of system health Build tools and automation to manage platform infrastructure and services Improve reliability, quality, and time to upgrade cluster and service versions Measure and optimize system performance and resource utilization,...


  • Toronto, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Toronto, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Toronto, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Old Toronto, Canada ClickHouse Full time

    We are committed to providing our customers with reliable and secure services so we are building out our newly formed Site Reliability Engineering team. As one of the first joiners to our Reliability Engineering Team at ClickHouse, you will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance...


  • Old Toronto, Canada ClickHouse Full time

    We are committed to providing our customers with reliable and secure services so we are building out our newly formed Site Reliability Engineering team. As one of the first joiners to our Reliability Engineering Team at ClickHouse, you will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance...