Site Reliability Engineer

1 week ago


Canada Indeed Full time $119,000 - $173,000

Job Title: Site Reliability Engineer - AI Platform

About the Role:

We are seeking a highly skilled Site Reliability Engineer to join our team and help build and maintain the reliability of our AI platform. The selected candidate will be responsible for designing, developing, and deploying scalable software and systems to increase product reliability and organizational efficiency.

Key Responsibilities:

  • Design and develop scalable software and systems to increase product reliability and organizational efficiency
  • Guide reliability practices through the entire software development lifecycle
  • Maintain service health through monitoring and incident response
  • Collaborate with Engineering teams to balance cloud cost optimization with delivering a high degree of system reliability

Requirements:

  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience
  • 2+ years of software development experience
  • Proficiency with cloud platforms and tools, specifically Amazon Web Services (AWS) AI services
  • Experience scaling efficiently ML model training or LLM workload preferably in AWS SageMaker

What We Offer:

  • Competitive salary: C$119,000 - C$173,000 CAD (Montreal Metro Area), C$122,000 - C$178,000 CAD (Toronto Metro Area), C$124,000 - C$180,000 CAD (Vancouver Metro Area)
  • Quarterly bonuses, Restricted Stock Units (RSUs), a Paid Time Off policy, and many region-specific benefits

Location: Remote



  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    Job DescriptionWe're looking for a passionate Junior Site Reliability Engineer to help build and maintain highly reliable, scalable, and performant systems. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our infrastructure and applications.Key ResponsibilitiesImplement and manage monitoring and observability...


  • Canada I Can Infotech Full time

    Job DescriptionWe're looking for a passionate Junior Site Reliability Engineer to help build and maintain highly reliable, scalable, and performant systems. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our infrastructure and applications.Key ResponsibilitiesImplement and manage monitoring and observability...


  • Canada I Can Infotech Full time

    Job SummaryWe are seeking a highly skilled and motivated Site Reliability Engineer to join our team at I Can Infotech. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining scalable and reliable systems that meet the needs of our customers.Key ResponsibilitiesInfrastructure Management: Assist in...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:• Implement and manage monitoring and observability tools to proactively identify and address potential issues.• Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:• Implement and manage monitoring and observability tools to proactively identify and address potential issues.• Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada Operant AI, Inc. Full time

    About Operant AI, Inc.Operant AI, Inc. is a leading provider of cloud-native security solutions. We are passionate about bringing state-of-the-art technological innovations from Operating Systems/Distributed Systems/AI to the world of cloud-native security.Job SummaryWe are seeking a highly skilled Staff Site Reliability Engineer to join our team. As our...


  • Canada Operant AI, Inc. Full time

    About Operant AI, Inc.Operant AI, Inc. is a leading provider of cloud-native security solutions. We are passionate about bringing state-of-the-art technological innovations from Operating Systems/Distributed Systems/AI to the world of cloud-native security.Job SummaryWe are seeking a highly skilled Staff Site Reliability Engineer to join our team. As our...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to help build and maintain highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system performance and stability.Develop...


  • Canada I Can Infotech Full time

    We are seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...


  • Canada I Can Infotech Full time

    We're seeking a skilled Junior Site Reliability Engineer to contribute to the development and maintenance of highly reliable, scalable, and performant systems.Key Responsibilities:Implement and manage monitoring and observability tools to proactively identify and address potential issues.Analyze system metrics and logs to gain insights into system...