At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together. As a Sr. Manager in Schwab’s Retail Technologies SAvE organization, you will serve as a senior technical leader on the Production Operations team, responsible for ensuring the scalability, performance, and reliability of Schwab’s client-facing applications. In this role, you will drive the strategic adoption of Site Reliability Engineering (SRE) practices, leading automation efforts, reducing toil, and enhancing system observability. You will be instrumental in engineering a culture of operational excellence across teams.
What You’ll Do
Lead and Scale Reliability Engineering Practices – Drive the reliability roadmap for retail-facing systems and drive initiatives that ensure high availability, low latency, and optimal performance.Automate and Eliminate Toil – Champion the development of automation frameworks and resilient systems that minimize manual intervention and streamline operational processes.Advance SRE Maturity – Embed SRE principles such as SLIs, SLOs, error budgets, incident retrospectives, and capacity planning into engineering workflows, enabling proactive risk mitigation.Evolve Observability and Monitoring – Enhance end-to-end monitoring, logging, and alerting systems to drive data-driven insights, shorten MTTD/MTTR, and continuously improve service health.Foster Cross-Team Collaboration – Work closely with Engineering, DevOps, Product, and Infrastructure teams to integrate reliability, security, and scalability into the software lifecycle.Provide Technical Leadership – Mentor engineers, promote engineering excellence, and build a high-performing culture of operational rigor and continuous improvement.Participate in On-Call Support – Lead by example in an on-call rotation, ensuring the reliability of Schwab’s retail technology platforms and contributing to long-term solutions from post-incident learning. What you haveRequired Qualifications:
10+ years of experience in software engineering and/or site reliability engineering, with significant experience leading production-critical systems in cloud environments.7+ years in DevOps or reliability-focused engineering roles, with a strong track record in automation, systems design, and production support.Deep expertise in SRE methodologies, with a focus on eliminating toil, designing self-healing systems, and leveraging metrics-based operational models.Hands-on experience with observability and infrastructure-as-code tools such as Prometheus, Grafana, Splunk, AppDynamics, Terraform, CloudFormation, Jenkins, and GitHub Actions.Strong programming and scripting skills in Python, Java, Go, or similar, with a focus on building internal tools and automation pipelines.Advanced understanding of distributed systems, cloud computing (AWS, Azure, or GCP), networking, load balancing, and application security best practices.Proven ability to drive complex, cross-functional initiatives from concept to production in a fast-paced environment.Excellent communication and leadership skills with the ability to influence engineering culture and align technical strategies with business objectives. Options Apply for this jobApplyShareRefer a friendRefer Sorry the Share function is not working properly at this moment. Please refresh the page and try again later. Share on your newsfeed Why work for us?Own Your Tomorrow embodies everything we do! We are committed to helping our employees ignite their potential and achieve their dreams. Our employees get to play a central role in reinventing a multi-trillion-dollar industry, creating a better, more modern way to build and manage wealth.
Benefits: A competitive and flexible package designed to empower you for today and tomorrow. We offer a competitive and flexible package designed to help you make the most of your life at work and at home—today and in the future. Application FAQs
Software Powered by iCIMS
www.icims.com