Site Reliability Engineer
Intercontinental Exchange
Full Time Atlanta, Georgia, United States Posted 1 year ago
About Position
Site Reliability Engineer (Full Time)
$0.00 / Hourly
Atlanta, Georgia, United States
Site Reliability Engineer
Full Time Atlanta, Georgia, United States Posted 1 year ago
Skills
Lead a geographically distributed team of Site Reliability Engineers Provide thought leadership; set the technical direction for the SRE Team Define and manage projects to meet Team objectives. Set individual goals and manage personal growth of team members Work closely with development teams to promote resilience and observability of the IMT platform Manage and troubleshoot a diverse set of SaaS applications and internal services Serve as the face of a team responsible for the overall health performance and capacity of our business applications Develop sustainable SRE practices around simplification and standardization Drive the cultural standard for SRE including defining ways of working runbooks and accountability across people processes and technology Lead Incident Response and Root Cause Analysis Partner with other SRE teams and lead by exampleDescription
This position is for a hands-on technical manager to lead a team of SRE engineers, focused on providing resilient, secure, scalable and supportable services for mortgage borrowers and lenders. You will contribute to the strategy and delivery of the team, as well as managing the day-to-day workload. This role requires building a close relationship with our customer support, operations, engineering, database and product organizations.
Responsibilities
- 3+ years of managing high-performance teams
- 10+ years of Application/Systems engineering in 24x7 production environments
- BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
- Experience in designing, deploying and operating SaaS applications and cloud infrastructure (AWS/GCP/Azure & on-prem virtualized environments)
- Experience leading Incident Response and root cause analysis (RCA) / post-mortems
- Strong systematic troubleshooting skills spanning systems, networks and code
- Experience with Infrastructure as Code solutions like Terraform, Ansible, Chef
- Proven track record decreasing MTTR (Mean-Time-To-Recovery) and improving overall service reliability
- Fluency in one or more current generation scripting language used by SRE/DevOps professionals (PowerShell, Python, Perl, PHP, Ruby)
- Strong communication skills
By applying to a job using PingJob.com you are agreeing to comply with and be subject to the PingJob.com Terms and Conditions for use of our website. To use our website, you must agree with the Terms and Conditions and both meet and comply with their provisions.