Lead Site Reliability Engineer, Observability
2 weeks ago
Lead Observability Engineer (Remote, North America) Vivun delivers Ava, the AI Sales Teammate for high‑velocity sales teams. As Lead Observability Engineer, you’ll rebuild and own our observability strategy across both agentic systems and SaaS infrastructure, creating frameworks and tooling that enable teams to ship confidently, measure performance, and maintain reliability as we scale. Base Pay $185,000 – $205,000 per year. Position Summary As the Observability Lead, you’ll design and implement Vivun’s observability patterns spanning infrastructure, applications, and agentic workloads. You’ll work closely with engineering, QA, and product to establish unified visibility across the full stack, from LLM‑driven agents to backend services. You won’t just monitor systems—you’ll define the patterns and tools that are a core part of empowering and driving Vivun’s engineering culture. Key Responsibilities Own the end‑to‑end observability strategy for Ava, defining standards, tools, and patterns that ensure reliable visibility. Design and implement correlation models linking agent behavior, LLM interactions, and SaaS telemetry into actionable insights. Unify observability tooling across teams, ensuring metrics, logs, and traces flow into a central platform. Collaborate with engineering and QA to embed observability best practices into workflows, CI/CD, and quality gates. Establish enablement frameworks—documentation, dashboards, templates—that make observability self‑serve. Partner to align observability with infrastructure reliability, alerting, and incident response. Contribute to performance and reliability strategy, defining agent quality, responsiveness, and scalability metrics. Desired Skills & Experience 6+ years in SRE, DevOps, or Observability Engineering, with 2+ years leading observability initiatives. Deep knowledge of OpenTelemetry, Prometheus, Grafana, Datadog, Honeycomb, Observe, etc. Experience with Agentic/LLM‑based systems (LangChain, Celery, OpenAI APIs, orchestration frameworks). Strong understanding of instrumenting, tracing, and correlating AI/LLM workflows with infrastructure telemetry. Proven ability to define cross‑team standards, influence culture, and establish scalable monitoring patterns. Strong collaboration and communication skills—enable, not dictate. Nice to Have Experience building observability into hybrid SaaS plus agent architectures. Background in data pipelines or analytics observability. Familiarity with Python‑ or Node.js‑based SDKs. Prior experience scaling observability in a startup or rapid‑growth environment. You Are A believer in Vivun’s core values: Set the Standard, Take Ownership, Stay Curious, Fast & Focused. Builder at heart: eager to build observability foundations for a next‑generation agentic platform. Innovative problem solver: ready to tackle cutting‑edge monitoring at the intersection of SaaS and AI. Collaborative: thrive in a high‑impact engineering culture that values enablement. Experienced in high‑growth startup environments; fast, adaptable, and goal‑driven. What You Will Have At Vivun Competitive salary and full health benefits. Stock options at a well‑funded, pre‑IPO company on a fast growth track. Flexible work schedules; fully remote. Unlimited PTO with two weeks of quiet period each year. An experienced team that will fight beside you to achieve goals. Seniority Level Mid‑Senior Level Employment Type Full‑time Job Function Engineering and Information Technology Industries Technology, Information and Internet #J-18808-Ljbffr
-
Lead Site Reliability Engineer – Observability
4 weeks ago
Vancouver, Canada Cognizant Full timeA leading technology company in Vancouver seeks a Site Reliability Lead Engineer to develop high-performing applications. The role focuses on standardizing resiliency practices, defining observability metrics, and leading capacity planning. The ideal candidate will work closely with product teams and incorporate performance testing into CI/CD pipelines,...
-
Senior Site Reliability Engineer, Observability
2 weeks ago
Vancouver, Canada Chainlink Labs Full timeSenior Site Reliability Engineer, Observability Join to apply for the Senior Site Reliability Engineer, Observability role at Chainlink Labs 1 day ago Be among the first 25 applicants About Chainlink Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink...
-
Senior Site Reliability Engineer, Observability
2 weeks ago
Vancouver, Canada Chainlink Labs Full timeSenior Site Reliability Engineer, Observability Join to apply for the Senior Site Reliability Engineer, Observability role at Chainlink Labs 1 day ago Be among the first 25 applicants About Chainlink Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink...
-
Staff, Site Reliability Engineer
2 weeks ago
Vancouver, Canada Royal Bank of Canada> Full timeJob DescriptionWhat is the opportunity? We are seeking a Staff, Site Reliability Engineer - Observability (Global Security) to own the resilience and "see-ability" of our mission-critical Identity and Access Management (IAM) platform. Your primary mission will be to design, build, and scale an end-to-end observability stack that provides deep, actionable...
-
Site Reliability Engineer
5 days ago
Vancouver, British Columbia, Canada Blockscout Limited Full timeBlockscout is a leading provider of indexing and UI services for EVM chains. Our team hosts explorers for many of the largest chains in the industry. Reliability is vital to our company's success. We are looking for a Site Reliability Engineer to strengthen our DevOps and Support teams.Key responsibilitiesMonitor systems: Proactively watch production systems...
-
Site Reliability Lead Engineer
4 weeks ago
Vancouver, Canada Cognizant Full timeCognizant’s Digital Engineering practice is seeking a highly qualified Site Reliability Lead Engineer with experience developing and building high‑performing, scalable, enterprise applications. You will be part of a digital software team that works on high‑demand applications. Our engineers have a passion for high‑quality, reliable and maintainable...
-
Lead Site Reliability Engineer- People Soft
4 weeks ago
VANCOUVER, Canada Royal Bank of Canada Full timeJob Description What is the opportunity? City National Bank (CNB), an RBC company, is seeking a Lead Site Reliability Engineer, who will be responsible for supporting CNB Corporate applications along with the implementation of Site Reliability Engineering solutions. As a Lead SRE, you will play a critical role in ensuring the reliability, scalability, and...
-
Site Reliability Engineer
6 days ago
Vancouver, Canada ScalePad Full timeAbout ScalePad ScalePad is a market‑leading SaaS company headquartered in Vancouver, Toronto, Montreal and Phoenix, AZ. With a global employee reach, we serve over 12,000 MSPs worldwide, helping them increase client value through integrated, automated products. ScalePad has earned multiple industry awards, including MSP Today’s Product of the Year and...
-
Site Reliability Engineer
4 days ago
Vancouver, Canada ScalePad Full timeAbout ScalePad ScalePad is a market‑leading SaaS company headquartered in Vancouver, Toronto, Montreal and Phoenix, AZ. With a global employee reach, we serve over 12,000 MSPs worldwide, helping them increase client value through integrated, automated products. ScalePad has earned multiple industry awards, including MSP Today’s Product of the Year and...
-
Site Reliability Engineer
1 week ago
Vancouver, Canada BNB Chain Full timeLayerZero The Future is Omnichain. Founded in 2021, LayerZero’s vision is to create a community of cross-chain developers, building dApps that are no longer constrained by individual blockchain capabilities. With LayerZero's simple, generic messaging protocol, builders will develop cross-chain dApps designed to unify the power of individual blockchains. We...