Site Reliability Engineer

3 weeks ago

Canada Kraken Full time

Site Reliability Engineer - Data Platform 2 days ago Be among the first 25 applicants Building the Future of Crypto Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology. Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world. Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here. As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Desktop, Wallet, and Kraken Futures. Become a Krakenite and build the future of crypto Proof of work The team Join our Data Infrastructure team and play a pivotal role in upholding the reliability, scalability, and efficiency of our robust Data platform. As a Senior Site Reliability Engineer (SRE) specialized in Data Infrastructure, you will collaborate closely with diverse cross-functional teams to conceive, execute, and oversee the foundational data infrastructure that empowers our array of applications and services. As a key member of our Data Infrastructure team, you will: Design the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations. Implement the infrastructure we use to ingest our data, store it, catalog it with the right metadata and capture its lineage. Provide a state-of-the-art suite of BI tools for multiple teams within the company. Guarantee the availability, high performance, scalability and cost efficiency of our data platform. Your proficiency in cloud technologies, infrastructure as code, automation, monitoring, logging, user and machine AuthNZ, and certificate management will be instrumental in upholding the exceptional operational standards we set for our services. The opportunity Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts. Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform. Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments. Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure. Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues. Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments. Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC). Ensure the timely and accurate processing of streaming data, enabling data analysts and engineers to gain insights from up-to-date information. Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration. Implement effective incident response procedures and participate in on-call rotations. Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions. Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement. Support AI/ML teams with their infra requests. Skills You Should HODL Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, Data Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security. Experience with maintaining real-time data processing technologies, such as Kafka and Flink clusters and Debezium instances. Working experience in managing hybrid multi-tenant cloud systems particularly on AWS. Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis. Experience with containerization and orchestration tools, particularly Kubernetes, Nomad, and Docker. Solid understanding of bash/shell scripting and proficiency in at least one programming language (preferably Python or JVM languages). Experience maintaining data-related technologies: Apache Airflow, Apache Spark, DBs, BI tooling. Experience solving data access management issues at large scale data-lake. Familiarity with CI/CD deployment pipelines and related tools. Strong problem-solving skills and the ability to troubleshoot complex systems. Experience with data-related technologies (databases, data lakes, airflow, spark) is a plus. This job is accepting ongoing applications and there is no application deadline. Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational institution. We consider qualified applicants with criminal histories for employment on our team, assessing candidates in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance. Kraken is powered by people from around the world and we celebrate all Krakenites for their diverse talents, backgrounds, contributions and unique perspectives. We hire strictly based on merit, meaning we seek out the candidates with the right abilities, knowledge, and skills considered the most suitable for the job. We encourage you to apply for roles where you don't fully meet the listed requirements, especially if you're passionate or knowledgable about crypto As an equal opportunity employer, we don’t tolerate discrimination or harassment of any kind. Whether that’s based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status or any other protected characteristic as outlined by federal, state or local laws. Follow us on Twitter Learn on the Kraken Blog Connect on LinkedIn Candidate Privacy Notice #J-18808-Ljbffr

Site Reliability Engineer

1 week ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time

Requisition ID: 245210Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.The TeamGlobal Banking and Markets Engineering (GBME) is the fast-moving, award-winning technology engine that powers Scotiabank's Corporate, Investment Banking and Capital Markets businesses.The RoleGBME is searching for a Site...
Site Reliability Engineer

2 weeks ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time

Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
Site Reliability Engineer

1 day ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time

Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
Site Reliability Engineer

2 weeks ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time

Requisition ID: 247129Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.As a SRE, you will implement, measure and gather insights from Operational Level Indicators identifying areas for service improvements covering availability, performance, resilience, incidents and chronic problems. You will implement...
Senior Site Reliability Engineer

19 hours ago

, , Canada Thinkific Full time

Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
Site Reliability Engineer

2 weeks ago

Canada Dayforce Full time

About the OpportunityAs a Site Reliability Engineer at Dayforce, you will be part of a pioneering team responsible for ensuring our industry-leading HCM platform delivers exceptional scalability, availability, and reliability. Dayforce is a global HCM technology company with operations across North America, EMEA, and APJ, and our award-winning cloud platform...
Senior Site Reliability Engineer

19 hours ago

, , Canada DuckDuckGo Full time

6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
Senior Site Reliability Engineer

19 hours ago

, , Canada TextNow Full time

This range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...
Manager, Site Reliability Engineer

4 weeks ago

, , Canada Command Alkon Incorporated. Full time

Title: Manager, Site Reliability Engineer (SRE) Summary of Role The Site Reliability Engineer (SRE) Manager leads the teams responsible for ensuring the availability, performance, and reliability of mission‑critical systems. This role bridges the gap between software engineering and operations by implementing automation, observability, and scalability...
Site Reliability Engineer

3 weeks ago

, , Canada Dayforce Full time

Base pay range CA$67,700.00/yr - CA$120,900.00/yr Dayforce is a global human capital management (HCM) company headquartered in Toronto, Ontario, and Minneapolis, Minnesota, with operations across North America, Europe, Middle East, Africa (EMEA), and the Asia Pacific Japan (APJ) region. Our award‑winning Cloud HCM platform offers a unified solution...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer