Senior IAM Resiliency Engineer
1 day ago
Job Description
What is the opportunity?
We are seeking an expert Senior Observability Engineer to own the resilience and "see-ability" of our mission-critical Identity and Access Management (IAM) platform. Your primary mission will be to
design, build, and scale an end-to-end observability stack
that provides deep, actionable insights into our distributed systems.
You will be the team's subject matter expert on monitoring, logging, tracing, and detection. By leveraging a diverse toolset including
Elastic Stack, Dynatrace, Prometheus, Grafana, Splunk and Catchpoint
, your work will directly
strengthen our detection capabilities and aggressively reduce our Mean Time to Detect (MTTD)
. This isn't just about collecting data; it's about transforming data into automated intelligence that proactively identifies and mitigates failures before they impact our users.
What will you do?
- Architect & Build: Design and implement a unified, multi-layered observability framework that provides a "single pane of glass" for our IAM services.
- Strengthen Detection: Develop sophisticated, high-signal/low-noise alerting strategies. This includes building anomaly detection models, predictive monitoring and critical integrity checks for unexpected configuration drifts, potential privilege escalation events and expiring certificates and keys to prevent security related outages.
- Reduce MTTD: Be the primary driver for initiatives, tooling, and process improvements focused on minimizing Mean Time to Detect and Mean Time to Resolution (MTTR).
Tool Integration & Management: Master and integrate our full stack of observability tools:
Metrics & Dashboards: Prometheus & Grafana for time-series metrics and visualization.
- Logging: Elastic Stack and/or Splunk for centralized logging, query optimization, and trend analysis.
- APM & Tracing: Dynatrace for deep application performance monitoring and distributed tracing across microservices.
Synthetic & RUM: Catchpoint for proactive, outside-in monitoring of critical IAM user journeys (like login, token issuance, and password reset).
Define "Normal": Establish and evangelize key Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for the IAM platform and build the dashboards that track them.
- Champion Resiliency: Partner with Infrastructure and Engineering teams to use observability data to inform chaos engineering tests, performance tuning, and capacity planning.
- Evangelize Best Practices: Train and mentor system engineers on observability best practices, instrumentation (e.g., OpenTelemetry), and building "observable-by-default" applications.
What do you need to succeed?
Must Have:
- Experience: 7+ years in a senior Observability, SRE, or DevOps role with a focus on monitoring highly-available, distributed systems.
- Metrics & Dashboards: Deep, hands-on expertise with Prometheus (incl. PromQL) and building complex, actionable dashboards in Grafana
- Logging Expertise: Proven experience managing and extracting value from large-scale logging platforms like ELK (Elasticsearch, Logstash, Kibana) or Splunk
- APM Mastery: Demonstrable experience using an APM tool like Dynatrace, New Relic, or AppDynamics to trace, debug, and optimize application performance.
- Synthetic Monitoring: Experience with synthetic monitoring tools like Catchpoint to model and validate critical user flows.
- Core Concepts: A strong "three pillars" foundation (metrics, logs, traces) and a passion for data-driven reliability.
- Automation: Strong scripting skills (e.g., Python, Go, Bash) and experience with Infrastructure as Code (Terraform, Ansible) for managing your monitoring stack.
- Communication: Excellent ability to communicate complex technical concepts to diverse audiences, from junior engineers to senior leadership.
Nice to Have:
- IAM Context: Experience monitoring IAM-specific protocols and services (e.g., OAuth2, OIDC, SAML, LDAP, SCIM).
- Trust & Integrity Monitoring: Experience building monitors for configuration drift, anomalous privilege escalation, and certificate lifecycle management.
- Anomaly Detection: Practical experience implementing or using AIOps and machine-learning-based anomaly detection systems.
- Cloud Native: Deep experience with observability in a Kubernetes and/or public cloud (AWS, GCP, Azure) environment.
- Distributed Tracing: Experience with OpenTelemetry, Jaeger, or Zipkin.
- Chaos Engineering: Familiarity with chaos engineering principles and tools (e.g., ChaosToolkit, Gremlin).
What's in It for You?
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable.
- Leaders who support your development through coaching and managing opportunities.
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Access to a variety of job opportunities across business and geographies
Job Skills
Agile Working, Agile Working, Application Security, Automation Tools, Bash (Programming Language), Cloud Platform, Cyber Security Management, Decision Making, Dynatrace APM, Elastic Logstash, ElasticSearch, Grafana, High Reliability, Identity Access Management (IAM), Information Security Management, Information Technology Security, Infrastructure Penetration Testing, Interpersonal Communication, IT Security Architecture, IT Systems Integration, Kubernetes, Prometheus (Software), Python (Programming Language), Red Hat Ansible, Security Information and Event Management (SIEM) {+ 5 more}
Additional Job Details
Address:
16 YORK ST:TORONTO
City:
Toronto
Country:
Canada
Work hours/week:
37.5
Employment Type:
Full time
Platform:
TECHNOLOGY AND OPERATIONS
Job Type:
Regular
Pay Type:
Salaried
Posted Date:
Application Deadline:
Note
:
Applications will be accepted until 11:59 PM on the day prior to the application deadline date above
I
*nclusion*
and Equal Opportunity Employment
At RBC, we believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.
Join our Talent Community
Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.
Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities
-
Senior IAM Automation Engineer
2 days ago
Toronto, Ontario, Canada HelloFresh Full time $80,000 - $140,000 per yearS'more about the teamAt HelloFresh, we're redefining how technology powers business growth. Our Information and Technology Alliance (IT Management) drives innovation and efficiency, ensuring seamless and secure identity management at a global scale. We're seeking a proactive Senior IT Engineer, IAM, who thrives on automation, security, and data-driven...
-
Senior Infrastructure Resiliency Engineer
1 week ago
Toronto, Ontario, Canada RBC Full time $120,000 - $180,000 per yearJob DescriptionWhat is the Opportunity?The Disaster Recovery and BCM team within the Data Centres and Operational Resiliency department is responsible to support RBC's critical business processes by ensuring technology is prepared to recover from the unexpected. As a Senior Infrastructure Resiliency Engineer you will be responsible for assessing disaster...
-
IAM Engineer I
1 week ago
Toronto, Ontario, Canada TD Full time $76,800 - $115,200 per yearWork Location:Toronto, Ontario, Canada*Hours:*37.5*Line Of Business:*Technology Solutions*Pay Details:*$76,800 - $115,200 CADThis role is eligible for a discretionary variable compensation award that considers business and individual performance.TD is committed to providing fair and equitable compensation opportunities to all colleagues. Growth opportunities...
-
Senior Infrastructure Resiliency Engineer
2 weeks ago
Toronto, Ontario, Canada RBC Full time $80,000 - $120,000 per yearJob DescriptionWhat is the Opportunity?The Disaster Recovery and BCM team within the Data Centres and Operational Resiliency department is responsible to support RBC's critical business processes by ensuring technology is prepared to recover from the unexpected. As a Senior Infrastructure Resiliency Engineer you will be responsible for assessing disaster...
-
Cyber Security Engineer PAM/IAM
2 weeks ago
Toronto, Ontario, Canada Nets-international Communication Full time $66,171 - $153,517 per yearJob OverviewThe Cybersecurity Engineer (PAM / IAM / Cloud Security) is responsible for implementing, managing, and securing identity and access systems across on-premises and cloud environments. This role ensures that privileged accounts, user access, and cloud resources are properly secured, monitored, and compliant with organizational policies and...
-
Cloud IAM Platform Engineer
3 days ago
Toronto, Ontario, Canada Wipro Full time US$118,000 - US$220,000 per yearJob description:Job Description:Build and maintain GitHub Actions workflows for self-service provisioning of infrastructure, secrets, and IAM roles using Terraform.Develop reusable Terraform modules that encapsulate compliant patterns for deploying GCP, Azure, and on-prem infrastructure (VMs, networks, K8s clusters, etc.).Integrate with HashiCorp Vault to...
-
Senior Customer Identity
1 week ago
Toronto, Ontario, Canada OMERS Full time US$80,000 - US$180,000 per yearChoose a workplace that empowers your impact. Join a global workplace where employees thrive. One that embraces diversity of thought, expertise and experience. A place where you can personalize your employee journey to be — and deliver — your best. We are a purpose-driven, dynamic and sustainable pension plan. An industry leading global investor with...
-
Manager, IAM Technology Operations
1 week ago
Toronto, Ontario, Canada Canada Life Full time $76,400 - $141,400 per yearPermanent Full TimeWe are looking for a Manager, IAM Technology Operations.The Manager, IAM Technology Operations, will work with a team of IAM Technology Operations and Governance Specialists supporting Critical IAM technology operations across Multiple IAM technology Platforms. Reporting to the Director, Logical access and Governance Operations(LAGO), this...
-
Lead IAM Engineer
3 days ago
Toronto, Ontario, Canada Thomson Reuters Full time $120,000 - $180,000 per yearLead PKI Engineer (IAM)Thomson Reuters is seeking a seasoned cybersecurity professional to lead the evolution of our PKI program—an enterprise-critical initiative driving crypto agility, certificate posture, and secure digital identity. This is a high-impact opportunity to shape the future of security architecture across a global organization.Join a...
-
Global Business Resilience Senior Advisor
4 days ago
Toronto, Ontario, Canada Elevance Health Full timeAnticipated End Date: Position Title:Global Business Resilience Senior Advisor (Information Security Senior Advisor)Job Description:Global Business Resilience Senior Advisor (Information Security Senior Advisor)Location: This role requires associates to be in-office 1 day per week, fostering collaboration and connectivity, while providing flexibility to...