SRE - Python - Airflow - Dynatrace
5 days ago
Job Description
We're looking for an SRE to elevate the reliability, performance, and efficiency of mission-critical batch workloads across Capital Markets Technology. You'll be the technical lead for hand-on automation, application development, host systems engineering, and observability via Dynatrace, with a primary focus on optimizing batch runtimes. If you love shaving milliseconds off latency, removing toil with code, and building resilient systems that just don't fail—you'll thrive here.
This role is critical to our operational excellence strategy and will play a key role in maturing our reliability engineering practices across the Capital Markets domain.
Key Responsibilities:
Reliability & Performance: Ensure stability and optimize batch processing pipelines; reduce runtime and failure rates, engineering for resiliency.
Observability: Implement and maintain monitoring with Dynatrace; create dashboards, alerts, and runbooks.
Systems Engineering: Manage and tune Linux and Windows systems for performance and resilience.
Automation & Orchestration: Create/Modify and optimize Airflow DAGs; build CI/CD pipelines for automation.
Incident Management: Lead incident response, root cause analysis, and postmortems; enforce SLOs and reliability practices.
Security & Compliance: Apply security best practices and ensure regulatory compliance in systems and automation.
Qualifications:
Expert-level Python: Advanced coding, performance tuning, concurrency (async/multiprocessing), testing, and packaging.
Linux Systems Expertise: Kernel/OS tuning, networking, filesystem optimization, process management, and troubleshooting.
Dynatrace Mastery: Custom dashboards, KPIs, anomaly detection, tagging strategy, and alerting configuration.
Airflow Expertise: DAG design best practices, SLA management, scheduler/executor tuning, and scaling strategies.
Proven experience optimizing batch workloads for performance, reliability, and cost.
Strong understanding of distributed systems concepts retries, idempotency, backpressure, and data integrity.
Strong understanding of backend systems and database optimization.
Proficiency with CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins) and Infrastructure as Code (Terraform, Ansible).
Proven experience with containers and orchestration (Docker, Kubernetes).
Excellent incident management and root cause analysis skills.
Strong communication and collaboration abilities.
-
Dynatrace - Application Performance Engineer
5 days ago
Toronto, Ontario, Canada Yochana Full timePosition Name – Dynatrace - Application Performance EngineerType of hiring – FulltimeLocation – Toronto, ON (Hybrid - 2 days a week)Job Description:We are seeking an experiencedApplication Performance Engineerto lead the Observability function for Capital Markets Technology. In this role, you will collaborate with Site Reliability Engineering (SRE),...
-
Dynatrace - Application Performance Engineer
3 days ago
Toronto, Ontario, Canada Aarorn Technologies Inc Full timeJob Title: Dynatrace - Application Performance EngineerLocation: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking an experienced Application Performance Engineer to lead the Observability function for Capital Markets Technology. In this role, you will collaborate with Site Reliability Engineering (SRE), Application...
-
AWS SRE Engineer
1 day ago
Toronto, Ontario, Canada BULL-IT SOLUTIONS LTD Full timeRequired Skill Set:• Design, implement, and maintain highly available and scalable systems on AWS.• Develop and manage CICD pipelines for automated deployments and testing.• Configure and optimize Dynatrace monitoring for application performance and infrastructure health.• Implement observability practices (metrics, logging, tracing) to improve...
-
Site-Reliability Engineer
3 days ago
Toronto, Ontario, Canada Aarorn Technologies Inc Full timeJob Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...
-
Site Reliability
2 weeks ago
Toronto, Ontario, Canada TECONICA SOFTWARES Full timeSite-Reliability EngineerLocation: Toronto, CanadaReports To: Director, Reliability Engineering – Capital Markets TechnologyRole Overview:We're looking for an SRE to elevate the reliability, performance, and efficiency of mission-critical batch workloads across Capital Markets Technology. You'll be the technical lead for hand-on automation, application...
-
Dynatrace Deployment Specialist
2 days ago
Toronto, Ontario, Canada Zeal Solutions Inc Full timeLooking for a skilled Dynatrace Deployment Specialist to lead the implementation and configuration of Dynatrace observability solutions. The ideal candidate will ensure the successful deployment, integration, and optimization of Dynatrace across enterprise environments.Key Responsibilities:Install, configure, and optimize Dynatrace OneAgent , ActiveGate ...
-
Azure SRE
3 days ago
Toronto, Ontario, Canada Aarorn Technologies Inc Full timeJob Title: Azure SRELocation: Toronto, ON (Hybrid - 4x Onsite a Week)Employment Type: Contract OpportunityInterview Type: Face 2 Face (Onsite Interview Only)Job DescriptionMonitoring and Alerting: Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact usersIncident Response:...
-
Dynatrace Deployment Specialist(French)
2 days ago
Toronto, Ontario, Canada Technology Hub Inc Full timeKey ResponsibilitiesInstall, configure, and optimize Dynatrace OneAgent, ActiveGate, Grail , and related components.Design and implement monitoring strategies for applications, infrastructure, and cloud environments.Lead Dynatrace deployment and integration across enterprise-scale environments.Integrate Dynatrace with ITSM tools and automation...
-
Dynatrace Performance Engineer
2 weeks ago
Toronto, Ontario, Canada Viva Tech Solutions Full timeQualifications:Bachelor's degree in Computer Science, Engineering, or related field.5+ years of experience in application performance engineering or a related role.Strong proficiency with Dynatrace.Solid knowledge of performance testing tools (e.g., JMeter, LoadRunner).Understanding of distributed systems, microservices, and cloud environments.Experience...
-
Senior Site Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada TECONICA SOFTWARES Full timeQualifications:· Expert-level Python: Advanced coding, performance tuning, concurrency (async/multiprocessing), testing, and packaging.· Linux Systems Expertise: Kernel/OS tuning, networking, filesystem optimization, process management, and troubleshooting.· Dynatrace Mastery: Custom dashboards, KPIs, anomaly detection, tagging strategy, and alerting...