Observability Engineer
6 months ago
Venture outside the ordinary - TMX Careers
The TMX group of companies includes leading global exchanges such as the Toronto Stock Exchange, Montreal Exchange, and numerous innovative organizations enhancing capital markets. United as a global team, we’re connecting cross-functionally, traversing industries and geographies, moving opportunity into action, advancing global economic growth, and propelling progress. Through a rich exchange of ideas, meaningful collaboration, and a nimble operating model, we're powering some of the nation's most critical systems, fueling capital formation and innovation, bringing increased opportunity to business visionaries, product ingenuity to consumers, and career exploration to our team.
Ready to be part of the action?
Department Overview:Global Technology Services (GTS) is one of the foundational divisions of the TMX Group, that empowers internal TMX business lines for their technology needs, operations and digital innovation. GTS as a client centric organization focuses on building technology capabilities, enabling our clients with the best technology solutions and providing effective technology financial and resource management processes. The cost effective operation is a key attribute of the GTS execution. GTS is responsible for delivery of all technology initiatives and services across TMX.
Role Summary:
As an Observability Engineer, you will play a crucial role in maintaining and improving the operational health of our applications and infrastructure. You will be responsible for setting up, configuring, and maintaining our monitoring and observability stack to ensure optimal system performance and reliability. You will be applying GitOps/DevOps principles to manage the platform and help to drive functionality and adoption through continuous improvement, simplification, and automation. You will work on the alignment, optimization, and strategy of our observability tools and platform. You'll work within a team of fellow observability and Systems engineers to make TMX reliability best of breed.
Key Accountabilities:
Develop and maintain robust monitoring solutions using Splunk, Splunk Observability, Grafana, AWS CloudWatch, and Prometheus.
Implement, maintain, and consult on the observability and monitoring framework that supports the needs of multiple internal stakeholders.
Create and manage dashboards and visualizations to provide actionable insights into system health, performance, and operational efficiency.
Help manage the Event, Incident, and Operations Escalation Management Policies.
Grow and evangelize the capabilities of our observability tools and platforms.
Collaborate with development and operations teams to integrate observability tools into the development lifecycle for continuous improvement.
Translate business requirements into technical solutions applying best practices and standards that meet the strategic business goal
Conduct performance analysis, diagnose issues, and provide solutions to enhance system reliability and scalability.
Document observability best practices and maintain configuration documentation.
Provide 2nd and 3rd level systems support
Liaise with vendors and other IT personnel for problem resolution
Must haves:
Proven experience with key observability and monitoring tools such as Splunk, Splunk Observability, Otel, Grafana, AWS CloudWatch, and Prometheus.
Strong understanding of cloud environments, preferably AWS, including deployment, management, and operations.
Proficient in creating and managing monitoring dashboards and setting up alerts to monitor all phases of the environment.
Solid background in scripting and automation using languages such as Python, Bash, or similar.
Excellent problem-solving skills, with the ability to handle complex troubleshooting and make critical system-related decisions.
Familiarity with configuration languages such as Ansible, and Terraform
Linux Operating System knowledge (RedHat Linux preferred)
Experience with Source Control Systems and familiar with basic branching and merging strategies. (Git, GitLab, Github, Bitbucket)
Strong communication skills, capable of effectively articulating technical challenges and solutions to stakeholders.
Nice to haves:
Experience with OS deployment systems (RedHat Satellite)
Experience with virtualization (VMWare, RHV)
Container platforms and orchestration (Kubernetes, OpenShift, SWARM)
Preferred qualifications:
Bachelor’s degree in Computer Science, Engineering, or a related technical field.
5+ years of experience in systems engineering/administration, platform/cloud/devops engineering, or a related field.
Relevant certifications in Splunk, AWS, or similar technologies.
Experience with additional observability and monitoring tools is a plus.
In the market for…
Excitement - Explore emerging technology and innovation, as well as ventures and digital finance that shape the future of global markets Experience the movement of the market while grounded in the stability of close to 200 years of success.
Connection - With site hubs in some of the world’s most multicultural cities, we leverage our size and structure to create rich connections and belonging while experiencing powerful global impact through our work.
Impact - More than a platform, we use our talents to power mission-critical systems that drive global economic advancement, innovation, and growth. As well, our employee-led spreads social good via our giving strategy.
Wellness - From empathetic leadership to a culture of flexibility and balance, we believe wellness at work creates the maximum yield and a stronger “we”. Plus, with a cloud-first and hybrid workstyle, as well as generous time-off and leaves, we support a life well lived
Growth - From a growth mindset in our work, to expansion in our business, TMX is home to action-takers energized by the achievement of ambitious growth.
Ready to enrich your career with impactful work, leaders who truly care, and the flexibility and programs to help you thrive as part of #TeamTMX ? Apply now.
TMX is committed to creating and sustaining a collegial work environment in which all individuals are treated with dignity and respect and one which reflects the diversity of the community in which we operate. We provide accommodations for applicants and employees who require it.
-
iOS Engineer, Observability SDK
6 months ago
Old Toronto, Canada Theorem, LLC Full timeData Theorem is an exciting company focused on creating a more secure world for data. Rooted in a strong engineer first culture, every employee has an impact on product and direction. We are searching for exceptional talent pursuing an opportunity to grow and take ownership of the projects that resonate most with them.As an iOS engineer, you will be...
-
Observability Specialist
3 weeks ago
Toronto, Ontario, Canada TSX Inc. Full timeTSX Inc. The TSX group of companies includes leading global exchanges such as the Toronto Stock Exchange, Montreal Exchange, and numerous innovative organizations enhancing capital markets. United as a global team, we're connecting cross-functionally, traversing industries and geographies, moving opportunity into action, advancing global economic growth,...
-
Software Engineer, Observability
6 months ago
Toronto, Canada Lyft Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. To do this, we start with our own community by creating an open, inclusive and diverse organization. Our Infrastructure team is passionate about building software to solve problems at massive scale. We do this often, and when we believe our solution is worth sharing...
-
Staff Software Engineer, Observability
6 months ago
Toronto, Canada Lyft Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. To do this, we start with our own community by creating an open, inclusive and diverse organization. Our Infrastructure team is passionate about building software to solve problems at massive scale. We do this often, and when we believe our solution is worth sharing...
-
Senior Infrastructure Engineer, Observability
1 month ago
Toronto, Ontario, Canada Lyft Full timeAbout the RoleWe are seeking an experienced Infrastructure Engineer to join our Observability team at Lyft. As a key member of our team, you will be responsible for the operation and maintenance of our logging and metrics infrastructure. Your expertise will ensure that all teams at Lyft are aware of the operational health of their products by monitoring...
-
Senior Software Engineer, Observability
6 months ago
Toronto, Canada Lyft Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. To do this, we start with our own community by creating an open, inclusive and diverse organization. Our Infrastructure team is passionate about building software to solve problems at massive scale. We do this often, and when we believe our solution is worth sharing...
-
Director of Product Management, Observability
3 weeks ago
Toronto, Canada MongoDB Full timeMongoDB is looking for a senior product leader to map out our long-term vision for customer-facing observability for MongoDB Atlas, and our internal observability platform These initiatives include 1) delivering best-in-class observability for our MongoDB database cluster experience, including solutions for diagnostics, insights, and recommendations for the...
-
Observability Systems Architect
2 weeks ago
Toronto, Ontario, Canada Grafana Labs Full timeDesign and Build Scalable Observability SystemsWe are looking for a principal-level engineer with a strong distributed systems background to lead the Observability backend initiatives at Grafana Labs.About the RoleDrive technical and business strategy in the Observability department.Influence the product roadmap and drive innovations from ideation to...
-
Toronto, Ontario, Canada Lyft Full timeAt Lyft, our mission is to revolutionize transportation with innovative solutions. To achieve this, we rely on our Infrastructure team to build scalable software that solves complex problems. As an Observability team member, you will play a crucial role in ensuring the operational health of our logging and metrics infrastructure. You will monitor system...
-
Application Observability Architect
3 weeks ago
Toronto, Canada Royal Bank of Canada> Full timeJob SummaryJob DescriptionApplication Observability ArchitectWHAT IS THE OPPORTUNITY?The Application Observability Architect plays a strategic leadership role in defining the architectures that ensures the organization’s systems are observable, resilient, and capable of delivering highly available systems and platforms. This position is accountable for...
-
Toronto, Ontario, Canada Grafana Labs Full timeAbout the RoleWe are seeking an experienced Distributed Systems Architect to lead our Observability backend initiatives. As a principal-level engineer, you will drive technical and business strategy in the Observability department.
-
Application Observability Architect
2 days ago
Toronto, Canada RBC Full timeJob Summary Job Description Application Observability Architect WHAT IS THE OPPORTUNITY? The Application Observability Architect plays a strategic leadership role in defining the architectures that ensures the organizations systems are observable, resilient, and capable of delivering highly available systems and platforms. This position is accountable for...
-
Application Observability Architect
2 days ago
Toronto, Canada RBC Full timeJob SummaryJob Description Application Observability Architect WHAT IS THE OPPORTUNITY? The Application Observability Architect plays a strategic leadership role in defining the architectures that ensures the organizations systems are observable, resilient, and capable of delivering highly available systems and platforms. This position is accountable for...
-
Observability Appops Performance/monitoring Sre
1 month ago
Toronto, Canada CAPCO Full time**Capco - The Future. Now.** **Let’s Talk About You** You want to Own Your Career. You’re serious about rising as far and as fast as your work and achievements can take you. And you’re ready to write the next chapter of your career story: a challenging and rewarding role. **Let’s Get Down to Business** Capco is looking for talented, innovative...
-
Cloud Observability Solutions Manager
2 weeks ago
Toronto, Ontario, Canada Grafana Labs Full timeAbout the RoleAs a Product Manager at Grafana Labs, you'll play a pivotal role in shaping the vision and strategy for our cloud-based observability solutions. This is an exciting opportunity to join a highly technical team and drive innovation in the field of distributed tracing.We're seeking an experienced Product Manager to lead our Distributed Tracing...
-
Sr Eng Manager, SRE
4 weeks ago
Toronto, Canada mccainfood Full time Position Title: Sr Eng Manager, SRE & Observability Position Type: Regular - Full-Time Position Location: Toronto HQ Requisition ID: 31044 JOB PURPOSE:Reporting to the Director, Infrastructure Operations, the Sr Engineering Manager, SRE &Observability will be responsible for: Design, implement and monitor enterprise-grade secure...
-
Enterprise Observability Solutions Strategist
2 weeks ago
Toronto, Ontario, Canada Grafana Labs Full timeWe are seeking an experienced Solutions Architect to join our world-class Customer Experience team at Grafana Labs. As a key member of our ProServ team, you will be responsible for delivering exceptional solutions and services to our customers.About the RoleThis role requires a strong understanding of business and technical requirements, as well as excellent...
-
AWS Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Lorien Full timep>Hybrid - ManchesterWe are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Lorien Full timeHybrid - Manchester We are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...
-
Senior Engineering Manager
3 weeks ago
Toronto, Ontario, Canada WAVE Full timeAbout the Role:The Director of Platform Engineering is responsible for the end-to-end health of the shared infrastructure and platform on which Wave products are built. This role will drive operations and automation to accelerate product delivery, and ensure consistent and stable services. The ideal candidate will oversee the Platform Engineering teams and...