Senior Site Reliability Engineer, Observability

7 days ago


Vancouver, Canada Chainlink Labs Full time

Senior Site Reliability Engineer, Observability Join to apply for the Senior Site Reliability Engineer, Observability role at Chainlink Labs 1 day ago Be among the first 25 applicants About Chainlink Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stablecoins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi. Many of the world’s largest financial services institutions have also adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. Chainlink leverages a novel fee model where offchain and onchain revenue from enterprise adoption is converted to LINK tokens and stored in a strategic Chainlink Reserve. The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load. This job would be perfect for someone who has a strong DevOps mentality, is passionate about building and maintaining a mature GitOps environment, and has experience focusing on observability. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow. We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don't match 100% of the job requirements: those describe people we've usually had a great time working with, but they're not a tick-box exercise. Your Impact Build and orchestrate Modern OTEL-based Observability Platform Support multiple telemetry types, like metrics, logs and traces. Define and support modern governance in observability and problems at scale. Ensure reliability, security, and performance exceed our defined SLAs Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action. Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline. Oversee the availability, performance, and supportability of our observability infrastructure. Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data. Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release. Champion reliability and security by taking the time to do your work right the first time Requirements 7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before Ability to develop software outside of the scope of typical infrastructure requirements and configurations Experience programming in C, C++, Java, Python, Go, Perl, or Ruby Expert knowledge in all aspects of designing, developing, and managing large real-time systems Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack. Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews Desired Qualifications Excitement for blockchain, Web 3.0, and similar decentralized technologies. Experience running any infrastructure in the blockchain/web3 space Ability to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity Experience working remotely in a distributed team A strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toil Tools & Services AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer We expect you to be comfortable with most of those tools and very proficient in several of them. Commitment to Equal Opportunity Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this form. Global Data Privacy Notice for Job Candidates and Applicants Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit is subject to our Privacy Policy. By submitting your application, you are agreeing to our use and processing of your data as required. #J-18808-Ljbffr



  • Vancouver, Canada Chainlink Labs Full time

    Senior Site Reliability Engineer, Observability Join to apply for the Senior Site Reliability Engineer, Observability role at Chainlink Labs 1 day ago Be among the first 25 applicants About Chainlink Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink...


  • Vancouver, Canada Vivun Full time

    Lead Observability Engineer (Remote, North America) Vivun delivers Ava, the AI Sales Teammate for high‑velocity sales teams. As Lead Observability Engineer, you’ll rebuild and own our observability strategy across both agentic systems and SaaS infrastructure, creating frameworks and tooling that enable teams to ship confidently, measure performance, and...


  • Vancouver, Canada Cognizant Full time

    A leading technology company in Vancouver seeks a Site Reliability Lead Engineer to develop high-performing applications. The role focuses on standardizing resiliency practices, defining observability metrics, and leading capacity planning. The ideal candidate will work closely with product teams and incorporate performance testing into CI/CD pipelines,...


  • Vancouver, Canada Royal Bank of Canada> Full time

    Job DescriptionWhat is the opportunity? We are seeking a Staff, Site Reliability Engineer - Observability (Global Security)  to own the resilience and "see-ability" of our mission-critical Identity and Access Management (IAM) platform. Your primary mission will be to design, build, and scale an end-to-end observability stack that provides deep, actionable...


  • Vancouver, British Columbia, Canada Aequilibrium Full time

    Senior Site Reliability EngineerAbout The RoleAs aSenior Site Reliability Engineer (SRE)at Aequilibrium, you will be embedded within agile scrum teams supporting large-scale digital modernization initiatives for Canadian public-sector partners. You will play a leading role in designing, implementing, maintaining, and optimizing cloud-native infrastructure...


  • Vancouver, British Columbia, Canada Aequilibrium Software Inc. Full time

    Senior Site Reliability EngineerAbout the RoleAs a Senior Site Reliability Engineer (SRE) at Aequilibrium, you will be embedded within agile scrum teams supporting large-scale digital modernization initiatives for Canadian public-sector partners. You will play a leading role in designing, implementing, maintaining, and optimizing cloud-native infrastructure...


  • Vancouver, Canada Canonical Full time

    Senior Site Reliability / Gitops Engineer Join the application for the Senior Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud,...


  • Vancouver, Canada Canonical Full time

    Senior Site Reliability / Gitops Engineer Join the application for the Senior Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud,...


  • Vancouver, Canada Chainlink Labs Full time

    Join to apply for the Senior Site Reliability Engineer role at Chainlink Labs Join to apply for the Senior Site Reliability Engineer role at Chainlink Labs Get AI-powered advice on this job and more exclusive features. About UsChainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web....


  • Vancouver, Canada Chainlink Labs Full time

    Join to apply for the Senior Site Reliability Engineer role at Chainlink LabsJoin to apply for the Senior Site Reliability Engineer role at Chainlink LabsGet AI-powered advice on this job and more exclusive features.About UsChainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web....