Site Reliability Engineer
4 weeks ago
Job Responsibilities:
- Objectives of this Role
- Run the IKP clusters by monitoring availability and taking a holistic view of system health
- Build tools and automation to manage platform infrastructure and services
- Improve reliability, quality, and time to upgrade cluster and service versions
- Measure and optimize system performance and resource utilization, and plan for future capacity
- Build dashboards and visualizations to graph system health
- Define system alerts and automate responses where possible
- Provide operational support and engineering for multiple software development teams
- Gather and analyze metrics from cluster components and services to assist in performance tuning and fault finding
- Partner with Core Engineering and Services Engineering teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
Typical candidates will have at least 5-10 years of experience in the technology field, preferably in software engineering.
EducationBachelor’s degree in Computer Science or related field. Experience Required 5 - 10 Years
Industry Type: IT
Employment Type: Contract
Location: China
#J-18808-Ljbffr-
Site Reliability Engineer
4 weeks ago
Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineer
7 days ago
Old Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada Zortech Solutions Full timeHi,Hope you are doing GreatThis side Priya Rajput from Zortech Solutions trying to reach you for an exciting job opening, kindly have a look to job description and revert me with your positive feedback. My mail ID is or call me on .Role: Site Reliability EngineerLocation: Toronto, ON-OnsiteDuration: Fulltime PermanentSkills and Responsibilities:...
-
Site Reliability Engineer
9 hours ago
Toronto, ON, Canada Tata Consultancy Services Full timeTCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...
-
Site Reliability Engineer
4 hours ago
Toronto, ON, Canada Tata Consultancy Services Full timeTCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...
-
Site Reliability Engineer
3 weeks ago
Toronto, Canada Autodesk Full timePosition Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean...
-
(Canada) Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Thomson Reuters Full time(Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...
-
Site Reliability Engineer
7 days ago
Old Toronto, Canada Autodesk Full timePosition Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to Recovery...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada eTeam Full timeRemote work Duration - 4 months - Preference is to find candidates who are willing to be converted to full time employee . The conversion decision will be made based on performance. Job description - ::: Role Desc : Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey Designing for and implementing...
-
Site Reliability Engineer
2 weeks ago
Old Toronto, Canada eTeam Full timeRemote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...
-
Site Reliability Engineer III
2 days ago
Toronto, Canada Rakuten Kobo Full timeThe Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability...
-
Director Site Reliability Engineering
4 weeks ago
Toronto, Canada BMO Full timeApplication Deadline: 04/29/2024Address:33 Dundas Street WestThis role is Hybrid (1-2 days per week in the office)The Director - Site Reliability Engineering will lead a team that will work with application teams, infrastructure teams, and business partners to continuously improve the stability, reliability and efficiency of Finance and Enterprise Risk...
-
Principal Site Reliability Engineer
2 weeks ago
Old Toronto, Canada Lightspeed Full timeHi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...
-
Site Reliability Engineer(Cloud, Automation)
3 weeks ago
Toronto, ON, Canada Behavox Full timeBehavox is shaping the future for how businesses harness their most important raw material - data. Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. From managing enterprise risk and compliance to maximizing revenue and value, our data operating platform presents...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Nityo Infotech Full timeJob Responsibilities: Objectives of this Role Run the IKP clusters by monitoring availability and taking a holistic view of system health Build tools and automation to manage platform infrastructure and services Improve reliability, quality, and time to upgrade cluster and service versions Measure and optimize system performance and resource utilization,...
-
Site Reliability Engineer
2 weeks ago
Toronto, Canada Tata Consultancy Services Full timeTCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...
-
Site Reliability Engineer
2 weeks ago
Toronto, Canada Tata Consultancy Services Full timeTCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...
-
Site Reliability Engineer
2 weeks ago
Toronto, Canada Tata Consultancy Services Full timeTCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...
-
Senior Site Reliability Engineer- Remote
1 month ago
Old Toronto, Canada ClickHouse Full timeWe are committed to providing our customers with reliable and secure services so we are building out our newly formed Site Reliability Engineering team. As one of the first joiners to our Reliability Engineering Team at ClickHouse, you will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance...
-
Senior Site Reliability Engineer- Remote
1 month ago
Old Toronto, Canada ClickHouse Full timeWe are committed to providing our customers with reliable and secure services so we are building out our newly formed Site Reliability Engineering team. As one of the first joiners to our Reliability Engineering Team at ClickHouse, you will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance...