Site Reliability Engineer

2 weeks ago


Vancouver, Canada NetApp Full time

Title: Site Reliability Engineer (SRE)

Location:

Bangalore, Karnataka, IN, 560071

Requisition ID: 127074

Job Summary

As a Site Reliability Engineer (SRE) with a specialization in storage, you'll manage and optimize a portfolio of customer-facing cloud services (SaaS/IaaS) on Google Cloud Platform (GCP), ensuring their overall availability, performance, and security. You will collaborate closely with global teams from NetApp and GCP, with a primary focus on supporting Google Cloud NetApp Volumes. This position includes rotational on-call work as part of a global team due to the critical nature of the services we support.

You will be working in a dynamic and fast-paced environment as an engineer on the Site Reliability Engineering (SRE) team. This team is responsible for assisting customers of Google Cloud NetApp Volumes in resolving complex technical issues in production environments. We are seeking an SRE with a deep understanding of storage systems, complex distributed systems, and cloud technologies, and the ability to articulate these concepts clearly to customers and fellow engineers.
You will work with your teammates and our customers to support innovative, cutting-edge technologies that address real-world challenges. You will provide valuable feedback and guidance to our Product and Engineering teams while representing the voice of our customers. You have the opportunity to make a significant impact and take real ownership of your work.

Job Requirements

o Collaborate with external customers and partners to ensure their success with Google Cloud NetApp Volumes.
o Respond to, troubleshoot, and drive root cause analysis (RCA) of complex live production incidents, including cross-platform issues involving OS, networking, and databases in cloud-based SaaS/IaaS environments by following and implementing SRE best practices.
o Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Google Cloud Monitoring, ElasticSearch, Grafana, and SolarWinds. Develop and implement steps to improve system and application performance, availability, and reliability.
o Document system knowledge, create runbooks, and ensure critical system information is readily available.
o Stay up-to-date with security trends and proactively identify, diagnose, and resolve complex security issues.
o Maintain and monitor deployment, orchestration of servers, Docker containers, databases, and general backend infrastructure.
o Automate tasks and system components that would benefit from automation or are performed manually.
o Utilize Atlassian Jira to track issues to resolution based on their priority.
o Engage in incident management processes and resolve issues within agreed SLAs/SLOs.

o Extensive experience in storage technologies and incident management processes.
o Advanced knowledge of Linux operating systems (e.g., Ubuntu, CentOS).
o Proficiency in container-based architecture (e.g., Kubernetes).
o Intermediate to advanced knowledge of automation tools and scripting languages such as Ansible, Python, Bash, Go, and PowerShell.
o Solid understanding of algorithms, data structures, and databases (SQL/NoSQL).
o Intermediate knowledge of networking concepts.
o Hands-on experience with cloud environments, particularly GCP.
o Exceptional debugging skills across various platforms and technologies.
o Familiarity with site reliability engineering principles and best practices.

Education

BE in Computer Science or a related field, or 6+ years of professional experience in a relevant role. 


Job Segment: Cloud, Software Engineer, Database, Computer Science, Linux, Technology, Engineering



  • Vancouver, British Columbia, Canada Electronic Arts Full time

    ResponsibilitiesWe are seeking a skilled Site Reliability Engineer to join our team at Electronic Arts. As a Site Reliability Engineer, you will work closely with our development teams to address build issues and improve our systems.Key ResponsibilitiesCollaborate with development teams to identify and resolve build issuesCreate and maintain pipelines and...


  • Vancouver, British Columbia, Canada Royal Bank of Canada> Full time

    Job SummaryThe Royal Bank of Canada is seeking a skilled Site Reliability Engineering Specialist to join its team. This role will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within the bank's technology infrastructure.Key ResponsibilitiesSupport and Development of Site...


  • Vancouver, Canada Themis Solutions Inc. Full time

    p>We are currently seeking a new Site Reliability Engineer, Co-op, to join our Engineering team in Burnaby, Calgary or Toronto.Applicants should be available for an 8-month co-op period from January 2025 to August 2025.What your team does:As a Site Reliability Engineer, you will help build, improve, and maintain Clio’s globally distributed network of...


  • Vancouver, Canada Royal Bank of Canada Full time

    Job SummaryThe Lead Support SRE will be responsible for the supporting and spearheading the development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. This team will work collaboratively with teams across several li


  • Vancouver, Canada Royal Bank of Canada Full time

    Job SummaryThe Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. This team will work collaboratively with teams across several lines of business


  • Vancouver, Canada Microsoft Full time

    Overview Are you an individual who loves to work on large-scale projects at one of the most exciting and diverse divisions within Microsoft? Are you looking for big, creative challenges that show immediate results since your customers are the product engineers for Office and M365? Do you want to be at the core of it all, acting as a force multiplier...


  • Vancouver, British Columbia, Canada Perlego Full time

    About the RoleWe are currently seeking a highly skilled Site Reliability Engineer to join our team at Perlego. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly available cloud-based...


  • Vancouver, Canada Microsoft Canada Full time

    Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world. Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of...


  • Vancouver, Canada Microsoft Canada Full time

    Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data...


  • Vancouver, Canada Microsoft Canada Full time

    Are you interested in working for one of the most exciting teams at Microsoft? Then look no further than Microsoft Teams SRE team. You will be building solutions that leverage state-of-the-art technologies to deliver the next evolution in collaboration and teamwork. What is a Site Reliability Engineer (SRE)? SRE is what you get when you treat operations as...


  • Vancouver, Canada Microsoft Canada Full time

    Are you interested in working for one of the most exciting teams at Microsoft? Then look no further than Microsoft Teams SRE team. You will be building solutions that leverage state-of-the-art technologies to deliver the next evolution in collaboration and teamwork. What is a Site Reliability Engineer (SRE)? SRE is what you get when you treat operations as...


  • Vancouver, Canada Arista Full time

    h3>Site Reliability Engineer (SRE) - CloudvisionFull-timeArista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation. We leverage the latest advancements in cloud computing, artificial intelligence, and software-defined...


  • Vancouver, Canada RBC Full time

    Job Summary The Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. This team will work collaboratively with teams across several lines of business and other Technology and Operations partners as a...


  • Vancouver, Canada TrustFlight Full time

    p>TrustFlight is at the forefront of digitizing the aviation industry with the creation of intelligent workflow applications that automate operating and maintenance processes, enabling our customers to focus on the data and insights that matter. We continue to build an amazing group of people who are all here to make our products, services and culture the...


  • Vancouver, British Columbia, Canada Royal Bank of Canada> Full time

    Job SummaryThe Royal Bank of Canada seeks a skilled Site Reliability Engineer to lead the development and implementation of SRE solutions for all applications within the organization. This role requires collaboration with cross-functional teams to ensure successful delivery of technology solutions.Key ResponsibilitiesDevelop and maintain production support...


  • Vancouver, Canada RBC Full time

    Job Summary The Application Support SRE will be responsible for the support, development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. This team will work collaboratively with teams across several lines of business and other Technology and Operations partners as a...


  • Vancouver, British Columbia, Canada Royal Bank of Canada Full time

    Company OverviewThe Royal Bank of Canada (RBC) is a leading financial institution that prides itself on providing exceptional banking services to its clients. With a strong presence in the Canadian market, RBC has a reputation for innovation and customer satisfaction.SalaryWe are offering a highly competitive salary range of $120,000 - $180,000 per year,...


  • Vancouver, British Columbia, Canada S.i. Systems Full time

    Job Description:We are seeking a Senior Site Reliability Engineer to develop robust observability solutions using Dynatrace and automate key monitoring processes through Terraform and PowerShell.Key Responsibilities:• Develop and implement observability solutions using Dynatrace• Automate key monitoring processes through Terraform and PowerShellAbout the...

  • DevOps Engineer

    2 months ago


    Vancouver, Canada Azad Technology Partners Full time

    p>AZAD Technology Partners is seeking a Site Reliability Engineer/ Devops Engineer for a full-time, W2 Contract position based in Chicago, IL.Schedule: Full-time, 40 hours/week, HybridAssignment Duration: 10 Months.AZAD Technology Partners is committed to Diversity, Equity & Inclusion and is striving to build an even more diverse, inclusive team that...


  • Vancouver, Canada RBC Full time

    Job Summary The Lead Support SRE will be responsible for the supporting and spearheading the development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. This team will work collaboratively with teams across several lines of business and other Technology and Operations...