Site Reliability Engineer
3 weeks ago
If you are passionate about reliability, automation, and performance optimization and working in a fast-paced, collaborative environment where innovation is encouraged, this role is for you.
We are looking for a Site Reliability Engineer to optimize and maintain our production environment, ensuring a highly available and scalable platform for our customers. In this role, you will work closely with Engineering to troubleshoot production issues, enhance application performance, and develop automation tools that streamline service deployment.
Responsibilities:
- Design, scale, and maintain high-availability Ubuntu Linux production and development environments in the public cloud.
- Architect, deploy, and maintain Kubernetes clusters, demonstrating in-depth knowledge of their core components and the ability to build clusters from scratch.
- Optimize load balancing, service mesh, and overall system availability to maximize uptime and performance.
- Leverage tools such as Jenkins, Ansible, Argo CD, Terraform, CloudFormation, and Resource Manager to implement and manage Infrastructure as Code (IaC).
- Strengthen security and availability monitoring across services, ensuring strict adherence to security policies.
- Deploy and manage workloads across AWS, Azure, or GCP, with expertise in instance management, IAM configuration, databases, caching, and troubleshooting.
- Maintain comprehensive documentation for all infrastructure components and configurations.
- Utilize monitoring tools such as Prometheus to proactively detect and resolve system issues.
- Assist Engineering teams in troubleshooting failures and performance bottlenecks, while participating in on-call rotations.
- Develop automation scripts and tools using Go, Python, Rust, or Bash, following industry best practices.
- Utilize strong networking fundamentals, including DNS, DHCP, and routing, to troubleshoot and optimize network performance.
Qualifications:
- Bachelor's degree in Computer Science or equivalent experience.
- 3+ years of experience with Linux/UNIX systems, including troubleshooting, memory management, performance tuning, I/O subsystems, RAID, and security.
- Proficiency in provisioning tools such as Ansible, Chef, or Terraform.
- Experience with CI/CD pipelines and tools like Jenkins.
- Proficiency in Go, Python, and Bash for scripting and automation.
- Good understanding of database systems such as MySQL or PostgreSQL.
- Experience with containerization and orchestration technologies, including Kubernetes, Mesos, or Docker Swarm.
- Practical experience with cloud platforms such as AWS, Azure, or GCP.
- Familiarity with monitoring tools like Prometheus and other observability solutions.
- Excellent collaboration, problem-solving, and communication skills in English.
- A strong passion for automation, scalability, and continuous improvement in infrastructure management.
Mid-Senior level
Employment typeContract
Job functionInformation Technology
IndustriesBanking
#J-18808-Ljbffr-
Site Reliability Engineering Director
2 weeks ago
Vancouver, British Columbia, Canada Conexiom Full timeJob Description:We are seeking a seasoned Site Reliability Engineering (SRE) Senior Manager to join our team at Conexiom. As an expert in SRE, you will be responsible for leading our SRE team in designing, implementing, and supporting highly available and scalable infrastructure in a cloud environment. Your strong background in site reliability engineering...
-
Principal Site Reliability Engineer
4 weeks ago
Vancouver, British Columbia, Canada Autodesk, Inc. Full timePrincipal Site Reliability Engineer - Database AdministrationPrincipal Site Reliability Engineer - Database AdministrationJob Requisition ID # 24WD81381Vancouver, BC (Hybrid)Position OverviewWe are looking for a Principal Site Reliability Engineer (SRE) who is passionate about cloud infrastructure and proficient in MySQL database administration. This pivotal...
-
Principal Site Reliability Engineer
4 weeks ago
Vancouver, British Columbia, Canada Autodesk, Inc. Full timePrincipal Site Reliability Engineer - Database AdministrationPrincipal Site Reliability Engineer - Database AdministrationJob Requisition ID # 24WD81381Vancouver, BC (Hybrid)Position OverviewWe are looking for a Principal Site Reliability Engineer (SRE) who is passionate about cloud infrastructure and proficient in MySQL database administration. This pivotal...
-
Site Reliability Engineer
2 weeks ago
Vancouver, British Columbia, Canada Sigmaways Inc Full timeIf you are passionate about reliability, automation, and performance optimization and working in a fast-paced, collaborative environment where innovation is encouraged, this role is for you. We are looking for a Site Reliability Engineer to optimize and maintain our production environment, ensuring a highly available and scalable platform for our customers....
-
Senior Site Reliability Engineer
2 weeks ago
Vancouver, British Columbia, Canada Regie Full timeWe're seeking a senior Site Reliability Engineer/DevOps who is passionate about building the best infrastructure and maintaining the health of the systems.- Design and maintain scalable, secure, and reliable infrastructure to support Regie.ai's SaaS platform and AI/data workloads.- Architect a unified monitoring and alerting system for engineering teams to...
-
DevOps Site Reliability Engineer
4 days ago
Vancouver, British Columbia, Canada TEEMA Full timeJob Title: DevOps Site Reliability EngineerAt TEEMA, we're seeking an experienced DevOps Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for the reliability and smooth operation of our services in both production and test environments.Responsibilities:Ensure service reliability and up-time in...
-
Senior Site Reliability Engineer
2 weeks ago
Vancouver, British Columbia, Canada Regie Full timeai is a Series B-funded, AI-native sales engagement automation platform focused on transforming business-critical prospecting—the top of the funnel—into a precise, scalable, and repeatable process. As the volume of sales activity required to book a meeting continues to grow exponentially, traditional tools have failed to keep pace—leaving critical...
-
Site Reliability Systems Engineer
2 weeks ago
Vancouver, British Columbia, Canada MasterCard Full timeAt Mastercard, we're committed to creating a more inclusive and connected world through the power of digital payments. As a Site Reliability Engineer, you'll play a critical role in shaping the future of commerce by building and maintaining the scalable infrastructure that supports our global payment ecosystem.The Cyber and Intelligence Solutions (C&I) team...
-
Principal Site Reliability Engineer
4 weeks ago
Vancouver, British Columbia, Canada Autodesk Full timeJob Requisition ID #24WD81381Principal Site Reliability Engineer - Database AdministrationVancouver, BC (Hybrid)Position OverviewWe are looking for a Principal Site Reliability Engineer (SRE) who is passionate about cloud infrastructure and proficient in MySQL database administration. This pivotal role centers around ensuring the utmost reliability,...
-
Senior Site Reliability Engineer
2 days ago
Vancouver, British Columbia, Canada Regie Full timeCompany Overview: Regie.Ai is a Series B-funded, AI-native sales engagement automation platform focused on transforming business-critical prospecting—the top of the funnel—into a precise, scalable, and repeatable process. As the volume of sales activity required to book a meeting continues to grow exponentially, traditional tools have failed to keep...
-
Staff Site Reliability Engineer
1 week ago
Vancouver, British Columbia, Canada Socotra, Inc. Full timeAbout UsSocotra, Inc. is a healthcare leader that has revolutionized the way patients receive care. By adopting a proactive approach, we aim to prevent illnesses and improve overall health outcomes.We're seeking a seasoned DevOps / Site Reliability Engineer to oversee our infrastructure architecture, drive innovation, and foster collaboration among teams.Key...
-
Vancouver, British Columbia, Canada Tyler Technologies, Inc. Full timeSite Reliability Engineer, Courthouse TechnologyThis Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. This engineer provides technical guidance to team...
-
Site Reliability Engineer
1 week ago
Vancouver, British Columbia, Canada Regie Full timeJob SummaryWe're looking for a Senior Site Reliability Engineer/DevOps to join our team at Regie.ai. The successful candidate will have extensive experience in designing and maintaining scalable, secure, and reliable infrastructure to support our SaaS platform and AI/data workloads.Responsibilities:Design and maintain production-grade infrastructure with...
-
Vancouver, British Columbia, Canada Tyler Technologies, Inc. Full timeSite Reliability Engineer, Courthouse TechnologyThis Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. This engineer provides technical guidance to team...
-
Site Reliability Engineer
6 days ago
Vancouver, British Columbia, Canada Regie Full timeAbout Regie.ai">Regie.ai is a Series B-funded, AI-native sales engagement automation platform focused on transforming business-critical prospecting—the top of the funnel—into a precise, scalable, and repeatable process. As the volume of sales activity required to book a meeting continues to grow exponentially, traditional tools have failed to keep...
-
Staff Cloud DevOps/Site Reliability Engineer
4 weeks ago
Vancouver, British Columbia, Canada Inworld Full timeview open rolesWhy Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft's M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and...
-
Cloud DevOps Engineer
1 week ago
Vancouver, British Columbia, Canada Inworld Full timeCompany Overview:Inworld is a leading AI engine for games and interactive media. Our suite of AI components enables developers to build interactive, responsive, and personalized AI gaming experiences. We power experiences built by top industry players such as Microsoft Xbox, Epic Games, and Unity.Job Description:We are looking for a Staff Cloud DevOps/Site...
-
Site Reliability Lead
1 week ago
Vancouver, British Columbia, Canada Regie Full timeWe are looking for a skilled Site Reliability Engineer/DevOps professional to join our team at Regie.ai. As a senior member of our engineering organization, you will be responsible for designing and maintaining our cloud-based infrastructure to ensure it meets the needs of our SaaS platform and AI/data workloads.Main ResponsibilitiesDesign and Implement...
-
Site Reliability Engineer Latam
6 days ago
Vancouver, British Columbia, Canada Launchpad Technologies Inc. Full timeLaunchpad, a people-first technology company, is a leader in North America´s rapidly growing tech sector. Through two solutions, Launchpad supports its clients with digital transformation:PaasportTM, our iPaaS solution, streamlines software integration and automates workflows.Nearshore Staff Augmentation, our managed IT staffing service, connects top IT...
-
Senior Site Reliability Engineer
6 days ago
Vancouver, British Columbia, Canada Autodesk, Inc. Full timeCompany Overview:We're Autodesk, Inc., a leading technology company committed to innovation and excellence. Our mission is to empower creativity, collaboration, and productivity through cutting-edge software solutions. As a Principal Site Reliability Engineer, you'll play a crucial role in shaping our cloud infrastructure strategy and ensuring the...