Site Reliability Engineer

3 weeks ago


Old Toronto, Canada Tecsys Full time

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our conveniently located offices and collaborative workspaces, provide our team with the freedom and flexibility to work in the way that makes our employees most productive.

About us

Tecsys is a fast-growing innovator offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tecsys could be a good fit for you

About the Role

We are looking for a Site Reliability Engineer to work within our “Network and Security Operations Center” department. Our NOC team is aimed at improving the reliability and uptime of our platform and applications in a data-driven way to support internal and external customers' needs.

Your responsibilities

  1. Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
  2. Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  3. Develop tools & automation on top of Azure & AWS to continuously reduce the need for manual intervention.
  4. Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  5. Be on-call.
  6. Practice sustainable incident response and blameless postmortems.
  7. Implement automated solutions for continuous integration and delivery (CI / CD).
  8. Implement monitoring, logging, alerting, and SLA reporting.
  9. Implement service monitoring dashboards displaying key metrics.
  10. Create and maintain technical documentation.
  11. Apply SRE best practices.
  12. Take command of high-severity incidents and facilitate their resolution.
  13. Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth.
  14. Collaborate with members of the Platform Engineering team to implement and support far-reaching strategic efforts, provide constructive feedback, and foster a collaborative environment.
  15. Work cross-functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users.

Requirements:

  1. Bachelor's degree in computer science or related technical discipline.
  2. At least 5 years’ experience in systems engineering experience; demonstrable technical experience in new platform development, orchestration, product ownership, and iterative design and deployment.
  3. Experience designing and deploying large scale systems, multi-vendor platforms and globally distributed infrastructure.
  4. Strong knowledge of system design; high performance computing; file, block, and storage technologies; integration of compute, storage, and network technologies to deliver cohesive infrastructure solutions.
  5. High level of understanding and examples of executing projects with full stack automation; our scale is going to require a lot of it, we grow to use less manual intervention and work with both internal and open-source tools to automate day-to-day activities.
  6. Self-organize, collaborate, and manage efforts with peers and teams across responsibility areas, languages, geography, and time zones.
  7. Be a self-starter, curious, and not afraid to ask questions and challenge the way things are done today.
  8. See a problem or opportunity, take ownership and act on it independently.
  9. Knowledge of Datadog preferred (or at least, similar/equivalent product).
  10. Knowledge of Rapid7 Insight preferred (or at least, similar/equivalent product).
  11. Knowledge and experience of AWS or Azure required.
  12. Basic knowledge of Java- or .Net-based development required.
  13. Knowledge of GitLab (enterprise license) preferred (or at minimum, Jenkins required).
  14. Experience with SaaS company is a strong asset.
  15. Experience with Fedramp (The Federal Risk and Authorization Management Program) compliance is a strong asset.
  16. Strong English communication skills, both written and spoken, are essential for effective correspondence with customers, business partners and colleagues beyond the province of Quebec.

Additional requirements:

  1. Escalation on-call rotation.
  2. Occasional travel (quarterly offsites, conferences – less than 10%).

At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team.

Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.

NB: if you are applying to this position, you must be a Canadian Citizen or a Permanent Resident of Canada, OR , have a valid Canadian work permit.

#J-18808-Ljbffr

  • Old Toronto, Canada Lorien Full time

    Hybrid - Manchester We are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...


  • Old Toronto, Canada TD Bank Full time

    Site Reliability Engineer Site Reliability Engineer Work Location: Canada Hours: 37.5 Line of Business: Technology Solutions Pay Details: We’re committed to providing fair and equitable compensation to all our colleagues. As a candidate, we encourage you to have an open dialogue with a member of


  • Old Toronto, Canada Lorien Full time

    p>Hybrid - ManchesterWe are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...


  • Old Toronto, Canada https:www.energyjobline.comsitemap.xml Full time

    h3>Site Reliability Engineer 100421 Site Reliability Engineer, SRE, Cloud, Permanent, Remote Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems and possess extensive experience with monitoring and automation tools.Ensure the reliability, availability,...


  • Old Toronto, Canada Sentry Full time

    About the role The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers.


  • Old Toronto, Canada https:www.energyjobline.comsitemap.xml Full time

    h3>Site Reliability Engineer - 100421 Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and automation tools.Ensure the reliability, availability, and performance of production systems Design, implement,...


  • Old Toronto, Canada Street Context Full time

    Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale ? Do you command with a steady hand when incidents unfold? Are you motivated by team success ? If so, continue reading… We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street...


  • Old Toronto, Canada Street Context Full time

    p>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...


  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Street Context Full time

    p>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...


  • Old Toronto, Canada Soda Full time

    Job Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...


  • Old Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineering Specialist to join our team at Thomson Reuters. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable systems and services that meet the needs of our customers.Key ResponsibilitiesDesign and implement scalable systems and...


  • Old Toronto, Canada Sentry Full time

    p>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...


  • Old Toronto, Canada Olx Full time

    p>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...


  • Old Toronto, Canada CEDENT Full time

    h3>Title: Site Reliability EngineerTerms of Hire: Full Time.Job Description:Seeking a highly qualified Site Reliability Engineer and Architect with experience developing and building high-performing, scalable, enterprise applications. Our engineers are agile and retrospective, and not afraid to identify what we’re doing wrong, so we can fix it, and what...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...


  • Old Toronto, Canada Mastech Inc. Full time

    Mastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...


  • Old Toronto, Canada Sentry Full time

    Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology. With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (5 months 29 days)Published 8 months agoCLOSEDGCPSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic and...


  • Old Toronto, Canada Infotree Global Solutions Full time

    About Infotree Global SolutionsInfotree Global Solutions is a leading provider of innovative solutions, and we're seeking an experienced Site Reliability Engineer to lead our team.Your RoleAs our Site Reliability Engineering Lead, you will be responsible for supervising a team of skilled engineers and ensuring the reliability and scalability of our global...