Site Reliability Engineer

1 month ago


Old Toronto Ontario, CA TD Bank Full time

Site Reliability Engineer

Site Reliability Engineer

Work Location: Canada

Hours: 37.5

Line of Business: Technology Solutions

Pay Details: We’re committed to providing fair and equitable compensation to all our colleagues. As a candidate, we encourage you to have an open dialogue with a member of our HR Team and ask compensation related questions, including pay details for this role.

Job Description:

CUSTOMER

  • Provide technical leadership to improve the design and operation of systems in alignment to reliability engineering best practices and overall Technology and Bank strategies, applying the practices of computer science and software engineering to the design and development of large, complex systems.
  • Drive and influence integrated DevOps solutions across business, product, platform, infrastructure, development, support/DevOps teams that improve the design and operation of systems, making them scalable, reliable, and efficient while ensuring performance and high availability of products/services.
  • Ensure availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of products/service(s) including enterprise systems that may serve multiple services and applications/segments.
  • Influence and partner with key technology and product team members in the design and development of solutions that promote automation and the elimination of toil; identify optimal ways to improve the design and operation of systems to make them more scalable, more reliable, and more efficient and have the ability to implement the required changes.
  • Define and prioritize problems to solve with applications/products/services and respective systems and drive the resolution/remediation with technology teams across design, implementation, and support.
  • Develop deep relationships with Product Owners, Tech Leads and Ops to build transparency and help foster end to end accountability of products and services.
  • Work in close partnership with technology teams to support TD's business objectives and operational support goals providing domain expertise on strategic Infrastructure as well as Business project related activities.
  • Review technical deliverables throughout the design and development phase to ensure systems adhere to SRE best practices.

SHAREHOLDER

  • Ensure adherence of Operational (Production) Readiness practices of respective products and services.
  • Set service-level objectives (SLO) that defines availability of a particular product or service and exercise key decision rights of the SRE role (e.g. supporting release to production, rejecting software that is operationally substandard and directing developers to improve the code etc.).
  • Implement the observability requirements to monitor and assure that our systems measure to the expected service levels and perform with the appropriate operational characteristics.
  • Focus on reliability, scalability, and the development of the production computing infrastructure, including highly complex and scalable systems.
  • Develop observability standards to ensure that production systems operate under known conditions and transparently provides these measurements to anticipate when errors or failures can arise.
  • Engineer solutions through problem post-mortem reviews to ensure that problem solutions are complete and that errors will not manifest again.
  • Anticipate internal and external business challenges, helping teams find solutions through continuously improving on process and technologies.
  • Lead interaction with governance and control groups, (e.g. regulatory/operational risk, compliance and audit) to provide subject matter expertise and consult on risk issues related to Engineering technology and tools.
  • Lead or contribute to cross-functional/enterprise initiatives as an organizational or subject matter expert helping to identify risk/provide guidance for significant and complex situations.
  • Proactively identify emerging technologies and innovative solutions for building more robust platform domains; keep abreast of emerging issues, trends, and evolving regulatory requirements and assess potential impacts.
  • Protect the interests of the organization – identify and manage risks, and escalate non-standard, high-risk transactions/activities as necessary.
  • Maintain a culture of risk management and control, supported by effective processes in alignment with risk appetite.

EMPLOYEE / TEAM

  • Participate fully as a member of the team, support a positive work environment that promotes service to the business, quality, innovation, and teamwork and ensure timely communication of issues/points of interest.
  • Support the team by continuously enhancing knowledge/expertise in own area and participate in knowledge transfer within the team and business unit.
  • Keep current on emerging trends/developments and grow knowledge of the business, related tools, and techniques.
  • Participate in personal performance management and development activities, including cross-training within own team.
  • Keep others informed and up to date about the status/progress of projects and/or all relevant or useful information related to day-to-day activities.
  • Contribute to the success of the team by willingly assisting others in the completion and performance of work activities; provide training, coaching and/or guidance as appropriate.
  • Contribute to a fair, positive and equitable environment that supports a diverse workforce.
  • Act as a brand ambassador for your business area/function and the bank, both internally and/or externally.

BREADTH & DEPTH:

  • Expert Site Reliability Engineering role with comprehensive expertise in leading-edge theories, engineering practices, extensive coding and scripting.
  • Advanced and highly specialized knowledge of applications, systems, networks, innovation models, design activities, best practices, business/organization, Bank standards, and may fulfill a governance role.
  • Engineering specialist assigned to work autonomously on high profile, complex and/or high-risk technology initiatives with significant impact to the organization.
  • Provides technical leadership/consulting/direction to multiple businesses and product teams, growing capability across the organization.
  • Resolves unique and complex problems that have a broad impact on the business.
  • Authoritative expert on site reliability issues within area of specialization.
  • Understands the journey of an enterprise transformation where there is a hybrid cloud/non-cloud operating model.
  • Drives end/end accountability of products and services across the enterprise through collaboration and transparency.
  • Primarily works at the product umbrella, segment, LOB or Product Family level.
  • Typically reports to the Site Reliability Practice Area Lead.

EXPERIENCE AND / OR EDUCATION

  • University degree in Computer Science or related technical field involving systems engineering or equivalent practical experience.
  • 10+ years of engineering experience (e.g. Software or platform).

Who We Are:

TD is one of the world's leading global financial institutions and is the fifth largest bank in North America by branches/stores. Every day, we deliver legendary customer experiences to over 27 million households and businesses in Canada, the United States and around the world. More than 95,000 TD colleagues bring their skills, talent, and creativity to the Bank, those we serve, and the economies we support. We are guided by our vision to Be the Better Bank and our purpose to enrich the lives of our customers, communities and colleagues.

Our Total Rewards Package:
Our Total Rewards package reflects the investments we make in our colleagues to help them and their families achieve their financial, physical, and mental well-being goals. Total Rewards at TD includes a base salary, variable compensation, and several other key plans such as health and well-being benefits, savings and retirement programs, paid time off, banking benefits and discounts, career development, and reward and recognition programs.

Additional Information:
We’re delighted that you’re considering building a career with TD. Through regular development conversations, training programs, and a competitive benefits plan, we’re committed to providing the support our colleagues need to thrive both at work and at home.

Colleague Development:
If you’re interested in a specific career path or are looking to build certain skills, we want to help you succeed. You’ll have regular career, development, and performance conversations with your manager, as well as access to an online learning platform and a variety of mentoring programs to help you unlock future opportunities.

Training & Onboarding:
We will provide training and onboarding sessions to ensure that you’ve got everything you need to succeed in your new role.

Interview Process:
We’ll reach out to candidates of interest to schedule an interview. We do our best to communicate outcomes to all applicants by email or phone call.

Accommodation:
Your accessibility is important to us. Please let us know if you’d like accommodations (including accessible meeting rooms, captioning for virtual interviews, etc.) to help us remove barriers so that you can participate throughout the interview process.

Language Requirement: N/A.

Our Values:
At TD we’re guided by our purpose to enrich the lives of our customers, communities and colleagues, and share a set of values that shape our culture and guide our behavior.

Our Commitment to Diversity, Equity, and Inclusion:
At TD, we’re committed to fostering an environment where all colleagues are encouraged to bring their authentic selves to work, experience equitable opportunities, and feel respected and supported.

Helping to Make an Impact in Communities – TD Ready Commitment:
TD has a long-standing commitment to help drive progress towards a more inclusive and sustainable future.

#J-18808-Ljbffr

  • Old Toronto, Ontario, CA Reperio Human Capital Full time

    ```html Site Reliability Engineer 100421 Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and automation tools. Responsibilities: Ensure the reliability,...


  • Old Toronto, Ontario, CA CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and Confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Ontario, CA Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze...


  • Old Toronto, Ontario, CA eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...


  • Old Toronto, Ontario, CA Rogers Part time

    Site Reliability Engineer Are you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of...


  • Old Toronto, Ontario, CA Rogers Communications, Inc. Part time

    Site Reliability EngineerAre you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of sports,...


  • Old Toronto, Ontario, CA Vaco Full time

    About the CompanyOur client operates global markets and builds digital communities and analytic solutions and is looking to hire a Site Reliability EngineerAbout the OpportunityStephen manages the infra group team, Windows, virtualization, IT infrastructure, etc. Works closely with Jeremy who is the hiring manager away for Pat leave. They are currently...


  • Old Toronto, Ontario, CA Lightspeed Full time

    ```html Job Opportunity: Principal Site Reliability Engineer Hi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuOrder by Lightspeed...


  • Old Toronto, Ontario, CA The Voleon Group Full time

    Voleon is a technology company that applies state-of-the-art machine learning techniques to real-world problems in finance. For more than 15 years, we have led our industry and worked at the frontier of applying machine learning to investment management. We have become a multi-billion-dollar asset manager, and we have ambitious goals for the future.Your...


  • Old Toronto, Ontario, CA Rogers Communications Full time

    Are you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of sports, news, e-commerce, and...


  • Old Toronto, Ontario, CA Rogers Communications Full time

    ```html Are you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of sports, news,...


  • Old Toronto, Ontario, CA Tecsys Inc. Full time

    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Old Toronto, Ontario, CA Lightspeed Full time

    Hi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Staff Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the...


  • Old Toronto, Ontario, CA Nityo Infotech Full time

    ```html Job Responsibilities: Objectives of this Role: Run the IKP clusters by monitoring availability and taking a holistic view of system health. Build tools and automation to manage platform infrastructure and services. Improve reliability, quality, and time to upgrade cluster and service versions. Measure and optimize system performance and resource...


  • Old Toronto, Ontario, CA PharmaLex Full time

    Your Job SRE at Pharmalex is the software engineering approach to production operations. 50% of your time will be building software to automate the manual work you do during the other 50% of your time will be providing operational support to the products you cover. SRE operates critical products 24/7/365 operating within agreed SLOs. Out-of-hours support via...


  • Old Toronto, Ontario, CA United Software Group Inc. - Canada Full time

    Position: Site Reliability Engineer Location: Toronto, Canada Duration: Contract Job Description: 3+ years of experience Advanced knowledge of the following SRE practices and technologies Python, YAML, Shell scripting Azure, Linux Dynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure monitor Chaos Engineering MQ, Kafka Perform production support...


  • Old Toronto, Ontario, CA Scotiabank Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Requisition ID: 197089Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. The Team We are looking for a developer to join our Digital Engineering Operations. The ideal candidate is passionate about designing and...


  • Old Toronto, Ontario, CA Guidewire Full time

    ```html ESSENTIAL DUTIES AND RESPONSIBILITIES Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments. Oversee and automate the team’s growing presence in AWS. Contribute to core infrastructure systems development with...


  • Old Toronto, Ontario, CA Ascend Fundraising Solutions Full time

    Founded in 2010, Ascend Fundraising Solutions provides online and in-venue fundraising platforms and solutions. Our innovative approach has been embraced by renowned non-profit organizations worldwide, including United Way, Vancouver Canucks Foundation, Canadian Olympic Foundation, Canadian Institute for the Blind, Kansas City Chiefs Foundation, Boston Red...


  • Old Toronto, Ontario, CA Snaphunt Full time

    ```html The Offer Great Opportunity The Job You will be responsible for: Gathering and evaluating user feedback. Providing code documentation and other inputs to technical documents. Supporting continuous improvement by investigating alternatives and new technologies and presenting these for architectural review. Troubleshooting and debugging to optimise...