Senior Site Reliability Engineer

4 weeks ago


New Canada NS Remotivate Full time

About the Company: Our client is one of the leading SMS providers for marketing teams in the US. Their advanced dashboard and queueing mechanisms help their clients scale campaigns to the next level. With a global team they're in scale-up mode and looking for strong problem solvers who thrive on building reliable systems. About the Role: We are looking for a Senior Site Reliability Engineer (SRE) with strong infrastructure experience to help ensure platform stability and optimize back-end systems in Python. You will play a key role in keeping their SMS marketing platform fast, reliable, and scalable. This is a highly technical position at the intersection of backend engineering and infrastructure. You'll be working hands-on with Python/Flask application, Linux servers, and networking stack to make sure millions of SMS messages are delivered without delay or downtime. This is a Full-Time remote role. We are looking for a Senior Site Reliability Engineer specifically with these requirements: 5+ years of experience as a Site Reliability Engineer, System Engineer, Infrastructure Engineer, Platform Engineer, Backend Systems Engineer, or similar role, ideally as a Python Developer. Experience running and maintaining Python/Flask applications in production. Advanced Python development skills, particularly with Python libraries/frameworks. In-depth knowledge of Linux server administration (Debian/Ubuntu). Proficiency with network analysis tools: intercepting proxies, packet captures (Wireshark, mitmproxy, tcpdump, etc.). Familiarity with distributed systems, scaling strategies, and performance tuning. Strong understanding of monitoring and logging systems (e.g., Prometheus, Grafana, ELK, Datadog). Experience with version control (Git) and CI/CD workflows. Comfort with automation tools and scripting for infrastructure management. Excellent troubleshooting and analytical skills. Strong sense of ownership and accountability for uptime, stability, and performance. Your responsibility will include (but not limited to): Maintain and optimize infrastructure: Manage Linux-based (Debian/Ubuntu) servers running Python/Flask applications, ensuring stability and performance. Ensure high uptime: Continuously monitor system health and proactively address bottlenecks or weak points to maximize reliability of SMS send-outs. Troubleshoot complex issues: Use intercepting proxies, packet captures, and diagnostic tools to identify, analyze, and resolve traffic or delivery issues. Optimize backend workflows: Work with Python/Flask async frameworks to streamline message queuing, delivery, and scaling mechanisms. Implement monitoring and alerting: Set up dashboards, logs, and alerts that provide visibility into system health and performance. Automate infrastructure tasks: Build tools/scripts to reduce manual work and ensure consistency in deployments and optimizations. Own decision-making: Take initiative in addressing infrastructure needs and make competent technical decisions without requiring constant supervision. Growth Opportunities/Perks: Endless growth opportunities as they're in a scale-up phase. Potential to move into a more elaborate R&D or leadership role. Flexible working schedule as long as deadlines and quality are met. Work alongside highly skilled developers in a unique and challenging industry. Performance bonuses as the company grows. Fully remote setup. This Position Is Perfect For You If... You're a fast learner. You won't be expected to know everything from the start, but you'll need to be motivated and quick to learn new tools, technologies, and patterns in a complex infrastructure environment. You're detail-oriented. You notice flaws in systems before they become problems, and you enjoy digging into logs, metrics, or packet captures until you find the root cause. You're reliable under pressure. When systems break, you don't panic — you troubleshoot calmly, take action, and make the right call to stabilize the platform. Our hiring process is made up of four parts, so please be aware that you will need to dedicate time for a questionnaire, a video, and two 1-on-1 interviews. Thank you for taking the time to consider this position. I look forward to hearing from you soon



  • , , Canada Thinkific Full time

    Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...


  • , , Canada Akamai Technologies Full time

    Senior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...


  • , , Canada DuckDuckGo Full time

    6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...


  • , , Canada Orion Innovation Full time

    Job Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...


  • , NS, Canada Instacart Full time

    We're transforming the grocery industry At Instacart, we invite the world to share love through food because we believe everyone should have access to the food they love and more time to enjoy it together. Where others see a simple need for grocery delivery, we see exciting complexity and endless opportunity to serve the varied needs of our community. We...


  • , , Canada Targeted Talent Full time

    Overview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...


  • , , Canada TextNow Full time

    This range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...


  • , , Canada Orion Innovation Full time

    We are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...


  • , , Canada Orion Innovation Full time

    Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...


  • , , Canada TekRek Full time

    This range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...