Senior Site Reliability Engineer II
4 weeks ago
We're transforming the grocery industry At Instacart, we invite the world to share love through food because we believe everyone should have access to the food they love and more time to enjoy it together. Where others see a simple need for grocery delivery, we see exciting complexity and endless opportunity to serve the varied needs of our community. We work to deliver an essential service that customers rely on to get their groceries and household goods, while also offering safe and flexible earnings opportunities to Instacart Personal Shoppers. Instacart has become a lifeline for millions of people, and we’re building the team to help push our shopping cart forward. If you’re ready to do the best work of your life, come join our table. Instacart is a Flex First team There’s no one-size fits all approach to how we do our best work. Our employees have the flexibility to choose where they do their best work—whether it’s from home, an office, or your favorite coffee shop—while staying connected and building community through regular in-person events. Learn more about our flexible approach to where we work. Overview About the Role Join our team as a Senior Site Reliability Engineer II, where your expertise will play a crucial role in maintaining the backbone of our platform's operations. You'll take on challenges directly, ensuring optimal performance and growth while fostering a culture that prioritizes diligent and effective reliability practices. We're seeking someone eager to take ownership, skilled at addressing complex issues, and ready to explore innovative solutions to support the well‑being of our teams and services. About the Team The Site Reliability Engineering (SRE) team combines software and systems engineering to design and manage large‑scale, distributed, and fault‑tolerant systems. This team is tasked with ensuring high reliability, optimal system performance, and continuous improvement for both Instacart's critical internal services and externally facing systems. SRE focuses on optimizing existing systems, building robust infrastructure, and automating processes to minimize manual effort. Joining the SRE team means facing unique scaling challenges while leveraging expertise in coding, algorithms, complexity analysis, and large‑scale system design. The team thrives within a culture of intellectual curiosity, problem‑solving, and collaboration. With members from diverse backgrounds and experiences, SRE fosters a supportive and risk‑tolerant environment where individuals are encouraged to think big, take on impactful projects, and grow with mentorship and guidance. About the Job Develop scalable infrastructure strategies to ensure high availability, that align infrastructure planning with product roadmaps, and optimize cost, risk and performance with cloud providers. Establish and lead incident management protocols and response plans to coordinate rapid responses, investigate root causes, prevent recurrence, and collaborate with security teams to test response readiness and address security risks. Continuously monitor performance metrics and trends to proactively identify reliability risks. Regularly refine SLOs, SLIs, and Error Budgets to align with evolving standards and leverage data insights to propose improvement plans and suggest architectural updates to enhance system reliability. Oversee regular system evaluations to pinpoint and refine process shortcomings and lead cross‑functional projects that promote system optimization and minimize technical debt. Collaborate with product and engineering teams to ensure system enhancements align with user requirements. Design and deploy automation tools to streamline deployment and operations, ensuring seamless processes while overseeing the continuous enhancement of automation scripts and frameworks, and rigorously monitor automated systems for performance and reliability. Address and tackle issues in automated environments promptly to reduce disruptions. Provide technical guidance to junior colleagues, fostering a collaborative culture for problem‑solving and innovation. Organize and lead knowledge‑sharing sessions and coordinate training in site reliability best practices to enhance team proficiency. About You Minimum Qualifications Proven experience in programming Robust knowledge of incident management processes and tools Exemplary troubleshooting and problem‑solving skills Ability to work under pressure and prioritize tasks during high‑stress situations Expertise in scaling application infrastructure for high availability Preferred Qualifications Proficient in Ruby or Go Experience with cloud platforms (eg, AWS, GCP, Azure) and containerization (eg, Docker, Kubernetes) Skill in risk assessment for foundational infrastructure changes Experience in monitoring system performance and trend analysis #LI-Remote Instacart provides highly market‑competitive compensation and benefits in each location where our employees work. This role is remote and the base pay range for a successful candidate is dependent on their permanent work location. Please review our Flex First remote work policy here. Currently, we are only hiring in the following provinces: Ontario, Alberta, British Columbia, and Nova Scotia. Offers may vary based on many factors, such as candidate experience and skills required for the role. Additionally, this role is eligible for a new hire equity grant as well as annual refresh grants. Please read more about our benefits offerings here. For Canadian based candidates, the base pay ranges for a successful candidate are listed below. CAN: $183,000 — $203,000 CAD #J-18808-Ljbffr
-
Senior SRE II — Remote, Scale
4 weeks ago
, NS, Canada Instacart Full timeA grocery delivery service provider is seeking a Senior Site Reliability Engineer II to maintain their platform's operations. The role involves developing scalable infrastructure, establishing incident management protocols, and monitoring performance metrics. Ideal candidates should have proven programming experience and the ability to work under pressure....
-
Senior Site Reliability Engineer II
4 weeks ago
, , Canada Instacart Full timeWe\'re transforming the grocery industry At Instacart, we invite the world to share love through food because we believe everyone should have access to the food they love and more time to enjoy it together. We deliver an essential service that customers rely on to get their groceries and household goods, while offering safe and flexible earnings...
-
Senior Site Reliability Engineer
3 days ago
, , Canada Thinkific Full timeJoin to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Akamai Technologies Full timeSenior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
-
Senior Site Reliability Engineer
3 days ago
, , Canada DuckDuckGo Full time6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
-
Senior Site Reliability Engineer
4 weeks ago
New Canada, NS Remotivate Full timeAbout the Company: Our client is one of the leading SMS providers for marketing teams in the US. Their advanced dashboard and queueing mechanisms help their clients scale campaigns to the next level. With a global team they're in scale-up mode and looking for strong problem solvers who thrive on building reliable systems. About the Role: We are looking for a...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeJob Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
-
Senior Site Reliability Engineer
4 weeks ago
, , Canada Targeted Talent Full timeOverview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...
-
Senior Site Reliability Engineer
3 days ago
, , Canada TextNow Full timeThis range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeWe are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...