Site Reliability Engineer

3 weeks ago


Vancouver, British Columbia, Canada Sigmaways Inc Full time

If you are passionate about reliability, automation, and performance optimization and working in a fast-paced, collaborative environment where innovation is encouraged, this role is for you.

We are looking for a Site Reliability Engineer to optimize and maintain our production environment, ensuring a highly available and scalable platform for our customers. In this role, you will work closely with Engineering to troubleshoot production issues, enhance application performance, and develop automation tools that streamline service deployment.

Responsibilities:

  • Design, scale, and maintain high-availability Ubuntu Linux production and development environments in the public cloud.
  • Architect, deploy, and maintain Kubernetes clusters, demonstrating in-depth knowledge of their core components and the ability to build clusters from scratch.
  • Optimize load balancing, service mesh, and overall system availability to maximize uptime and performance.
  • Leverage tools such as Jenkins, Ansible, Argo CD, Terraform, CloudFormation, and Resource Manager to implement and manage Infrastructure as Code (IaC).
  • Strengthen security and availability monitoring across services, ensuring strict adherence to security policies.
  • Deploy and manage workloads across AWS, Azure, or GCP, with expertise in instance management, IAM configuration, databases, caching, and troubleshooting.
  • Maintain comprehensive documentation for all infrastructure components and configurations.
  • Utilize monitoring tools such as Prometheus to proactively detect and resolve system issues.
  • Assist Engineering teams in troubleshooting failures and performance bottlenecks, while participating in on-call rotations.
  • Develop automation scripts and tools using Go, Python, Rust, or Bash, following industry best practices.
  • Utilize strong networking fundamentals, including DNS, DHCP, and routing, to troubleshoot and optimize network performance.

Qualifications:

  • Bachelor's degree in Computer Science or equivalent experience.
  • 3+ years of experience with Linux/UNIX systems, including troubleshooting, memory management, performance tuning, I/O subsystems, RAID, and security.
  • Proficiency in provisioning tools such as Ansible, Chef, or Terraform.
  • Experience with CI/CD pipelines and tools like Jenkins.
  • Proficiency in Go, Python, and Bash for scripting and automation.
  • Good understanding of database systems such as MySQL or PostgreSQL.
  • Experience with containerization and orchestration technologies, including Kubernetes, Mesos, or Docker Swarm.
  • Practical experience with cloud platforms such as AWS, Azure, or GCP.
  • Familiarity with monitoring tools like Prometheus and other observability solutions.
  • Excellent collaboration, problem-solving, and communication skills in English.
  • A strong passion for automation, scalability, and continuous improvement in infrastructure management.
Seniority level

Mid-Senior level

Employment type

Contract

Job function

Information Technology

Industries

Banking

#J-18808-Ljbffr

  • Vancouver, British Columbia, Canada Conexiom Full time

    Job Description:We are seeking a seasoned Site Reliability Engineering (SRE) Senior Manager to join our team at Conexiom. As an expert in SRE, you will be responsible for leading our SRE team in designing, implementing, and supporting highly available and scalable infrastructure in a cloud environment. Your strong background in site reliability engineering...


  • Vancouver, British Columbia, Canada Autodesk, Inc. Full time

    Principal Site Reliability Engineer - Database AdministrationPrincipal Site Reliability Engineer - Database AdministrationJob Requisition ID # 24WD81381Vancouver, BC (Hybrid)Position OverviewWe are looking for a Principal Site Reliability Engineer (SRE) who is passionate about cloud infrastructure and proficient in MySQL database administration. This pivotal...


  • Vancouver, British Columbia, Canada Autodesk, Inc. Full time

    Principal Site Reliability Engineer - Database AdministrationPrincipal Site Reliability Engineer - Database AdministrationJob Requisition ID # 24WD81381Vancouver, BC (Hybrid)Position OverviewWe are looking for a Principal Site Reliability Engineer (SRE) who is passionate about cloud infrastructure and proficient in MySQL database administration. This pivotal...


  • Vancouver, British Columbia, Canada Sigmaways Inc Full time

    If you are passionate about reliability, automation, and performance optimization and working in a fast-paced, collaborative environment where innovation is encouraged, this role is for you. We are looking for a Site Reliability Engineer to optimize and maintain our production environment, ensuring a highly available and scalable platform for our customers....


  • Vancouver, British Columbia, Canada Regie Full time

    We're seeking a senior Site Reliability Engineer/DevOps who is passionate about building the best infrastructure and maintaining the health of the systems.- Design and maintain scalable, secure, and reliable infrastructure to support Regie.ai's SaaS platform and AI/data workloads.- Architect a unified monitoring and alerting system for engineering teams to...


  • Vancouver, British Columbia, Canada TEEMA Full time

    Job Title: DevOps Site Reliability EngineerAt TEEMA, we're seeking an experienced DevOps Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for the reliability and smooth operation of our services in both production and test environments.Responsibilities:Ensure service reliability and up-time in...


  • Vancouver, British Columbia, Canada Regie Full time

    ai is a Series B-funded, AI-native sales engagement automation platform focused on transforming business-critical prospecting—the top of the funnel—into a precise, scalable, and repeatable process. As the volume of sales activity required to book a meeting continues to grow exponentially, traditional tools have failed to keep pace—leaving critical...


  • Vancouver, British Columbia, Canada MasterCard Full time

    At Mastercard, we're committed to creating a more inclusive and connected world through the power of digital payments. As a Site Reliability Engineer, you'll play a critical role in shaping the future of commerce by building and maintaining the scalable infrastructure that supports our global payment ecosystem.The Cyber and Intelligence Solutions (C&I) team...


  • Vancouver, British Columbia, Canada Autodesk Full time

    Job Requisition ID #24WD81381Principal Site Reliability Engineer - Database AdministrationVancouver, BC (Hybrid)Position OverviewWe are looking for a Principal Site Reliability Engineer (SRE) who is passionate about cloud infrastructure and proficient in MySQL database administration. This pivotal role centers around ensuring the utmost reliability,...


  • Vancouver, British Columbia, Canada Regie Full time

    Company Overview: Regie.Ai is a Series B-funded, AI-native sales engagement automation platform focused on transforming business-critical prospecting—the top of the funnel—into a precise, scalable, and repeatable process. As the volume of sales activity required to book a meeting continues to grow exponentially, traditional tools have failed to keep...


  • Vancouver, British Columbia, Canada Socotra, Inc. Full time

    About UsSocotra, Inc. is a healthcare leader that has revolutionized the way patients receive care. By adopting a proactive approach, we aim to prevent illnesses and improve overall health outcomes.We're seeking a seasoned DevOps / Site Reliability Engineer to oversee our infrastructure architecture, drive innovation, and foster collaboration among teams.Key...


  • Vancouver, British Columbia, Canada Tyler Technologies, Inc. Full time

    Site Reliability Engineer, Courthouse TechnologyThis Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. This engineer provides technical guidance to team...


  • Vancouver, British Columbia, Canada Regie Full time

    Job SummaryWe're looking for a Senior Site Reliability Engineer/DevOps to join our team at Regie.ai. The successful candidate will have extensive experience in designing and maintaining scalable, secure, and reliable infrastructure to support our SaaS platform and AI/data workloads.Responsibilities:Design and maintain production-grade infrastructure with...


  • Vancouver, British Columbia, Canada Tyler Technologies, Inc. Full time

    Site Reliability Engineer, Courthouse TechnologyThis Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. This engineer provides technical guidance to team...


  • Vancouver, British Columbia, Canada Regie Full time

    About Regie.ai">Regie.ai is a Series B-funded, AI-native sales engagement automation platform focused on transforming business-critical prospecting—the top of the funnel—into a precise, scalable, and repeatable process. As the volume of sales activity required to book a meeting continues to grow exponentially, traditional tools have failed to keep...


  • Vancouver, British Columbia, Canada Inworld Full time

    view open rolesWhy Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft's M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and...


  • Vancouver, British Columbia, Canada Inworld Full time

    Company Overview:Inworld is a leading AI engine for games and interactive media. Our suite of AI components enables developers to build interactive, responsive, and personalized AI gaming experiences. We power experiences built by top industry players such as Microsoft Xbox, Epic Games, and Unity.Job Description:We are looking for a Staff Cloud DevOps/Site...


  • Vancouver, British Columbia, Canada Regie Full time

    We are looking for a skilled Site Reliability Engineer/DevOps professional to join our team at Regie.ai. As a senior member of our engineering organization, you will be responsible for designing and maintaining our cloud-based infrastructure to ensure it meets the needs of our SaaS platform and AI/data workloads.Main ResponsibilitiesDesign and Implement...


  • Vancouver, British Columbia, Canada Launchpad Technologies Inc. Full time

    Launchpad, a people-first technology company, is a leader in North America´s rapidly growing tech sector. Through two solutions, Launchpad supports its clients with digital transformation:PaasportTM, our iPaaS solution, streamlines software integration and automates workflows.Nearshore Staff Augmentation, our managed IT staffing service, connects top IT...


  • Vancouver, British Columbia, Canada Autodesk, Inc. Full time

    Company Overview:We're Autodesk, Inc., a leading technology company committed to innovation and excellence. Our mission is to empower creativity, collaboration, and productivity through cutting-edge software solutions. As a Principal Site Reliability Engineer, you'll play a crucial role in shaping our cloud infrastructure strategy and ensuring the...