Site Reliability Engineer
2 days ago
Join us as a Senior Site Reliability Engineer to help us run an industry-scale GPU cluster via Kubernetes. Together with senior members of our team, you will combine your strong understanding of system scaling and security practices with your cloud-native expertise to stand up and maintain Kubernetes clusters from scratch. Your role will also be pivotal in supporting our other service offerings, from full-stack development to AI integration, ensuring they are robust, scalable, and secure. We need engineers on our team to be versatile, display leadership qualities and be enthusiastic to take on new problems across the stack as we solve new and interesting technologies problems. As a senior member of the team, you will be relied upon to design robust solutions that solve client problems, drive consensus around technical solutions, and ultimately own the success of projects. In return, you can expect latitude in the way you choose to run projects and design systems, while receiving direct support, guidance, and coaching from Bit Complete’s management team. What you'll be doing Develop and implement comprehensive infrastructure strategies that emphasize reliability, flexibility, and security. Manage and scale our cloud-native environments, including Kubernetes clusters and container orchestration. Oversee the deployment and maintenance of infrastructure tools. Lead initiatives on stateless architectures to enhance scalability and maintainability of our systems. Utilize your expertise in distributed systems using technologies like Kafka, Postgres, Redis, and Elasticsearch. Design and monitor CI/CD pipelines to streamline deployment processes using tools like Spinnaker. Implement and manage monitoring solutions using OpenTSDB, Prometheus, Grafana, and Envoy to ensure optimal performance and reliability. Provide leadership and direction to the infrastructure team, fostering a culture of continuous learning and improvement. Your Background Relevant industry experience, specifically in Site Reliability Engineering or a similar role, with a proven track record in technical leadership and setting the direction for scalable systems. Strong background in managing and deploying infrastructure in cloud-native environments (AWS and GCP). Experience with container orchestration (Docker, Kubernetes), and infrastructure as code (Terraform, Pulumi). Experience with monitoring and logging tools, and a solid understanding of network metrics. Familiarity with Linux skills and excellent problem-solving, debugging, and troubleshooting skills. Proficiency in system design and a solid understanding of distributed systems, DevOps tools and practices, particularly in developing and maintaining CI/CD systems for fully automated deployment, testing, and monitoring of applications. Familiarity with MLOps practices, including automation and orchestration of machine learning models. Experience with database technologies and designing infrastructure to support both traditional and AI-driven applications. Excellent communication skills with the ability to engage and influence both technical and non-technical stakeholders. About Us CAD $150,644 - $200,644 annually.Our ranges include base salary and conservative bonus target. Interested? We're excited about working with you, so get in touch Submit your application here The world of work today is overflowing with systems, processes, tools, and assumptions that are flawed and that can push directly against our ability to express what is unique about each of us in the work we do every day. We believe people from diverse backgrounds, with different identities and experiences, make our company better. No matter your background, we'd love to hear from you Alignment with our values is just as important as experience. Also, please let us know if there are ways we can make our interview process better for you - we're always happy to listen and accommodate where possible. #J-18808-Ljbffr
-
Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeSenior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...
-
Senior Site Reliability Engineer
2 days ago
, , Canada Thinkific Full timeJoin to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
-
Systems Reliability Engineer
1 week ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full time $120,000 - $180,000 per yearRequisition ID: 239640Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.The RoleAs a member of the Systems Reliability Engineering team, the System Reliability Engineer will collaborate closely with Engineering and development teams, peers, and business partners to continuously improve the stability,...
-
Director, Site Reliability Engineering
3 weeks ago
, , Canada Icon Full timeHelping SaaS companies scale Engineering teams. Director, Site Reliability Engineering We are seeking an accomplished Director of Site Reliability Engineering (SRE) to lead the reliability, scalability, and performance initiatives across multiple enterprise technology domains, including AML, Risk, Finance, Corporate Treasury, and Human Resources systems....
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeJob Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Akamai Technologies Full timeSenior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Targeted Talent Full timeOverview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...
-
Senior Site Reliability Engineer
2 days ago
, , Canada DuckDuckGo Full time6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
-
Senior Site Reliability Engineer
2 days ago
, , Canada TextNow Full timeThis range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...
-
Site Reliability Engineer
3 weeks ago
, , Canada Telna Full timeSite Reliability Engineer – Security Engineer