Site Reliability Engineer

AT&T Inc

Contract Plano , Texas, United States Posted 3 years ago

 Write a Review Add Vendor   Add Contact  

About Position

Site Reliability Engineer (Contract)

$90.00 / Hourly

Plano , Texas, United States

Site Reliability Engineer

Contract Plano , Texas, United States Posted 3 years ago

Skills
Build software to help operations and support teams - Proactively build and implement services to make operations more effective and reduce toil. This includes adjustments to monitoring and alerting to automating scripts and code in production. Candidate can be tasked with building a homegrown tool from scratch to help with issues in software delivery or resolving impacts from outages/incident. Fix support escalation issues; Optimize on-call rotations and processes - Improve system reliability through the optimization of on-call processes. Add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally update runbooks tools and documentation to help prepare on-call teams for future incidents. Document “tribal” knowledge - Gain exposure to systems in both staging and production and take part in work with software development support IT operations and on-call duties – to build up historical knowledge over time. Instead of silo-ing this knowledge ensure constant upkeep of documentation and runbooks to ensure that teams get the information they need right when they need it. Conducting post-incident reviews - Thorough and transparent post-incident reviews to keep teams honest and ensure that everyone is conducting post-incident reviews documenting their findings and taking action on their learnings. Take action items for building or optimizing parts of the SDLC or incident lifecycle to bolster reliability of the service. Develop automation for mission critical applications using scripts programs Provide customer impact analysis and troubleshoot complex issues using domain knowledge of AT&T Sales & Ordering flows applications and downstream interfaces Support APIs in K8s environment Contribute to design and implementation of new system layers utilizing principles of high-complexity compute environments. Provide on-call support for Production customer facing issues Work with developers environment teams to identify necessary resources and remove constraints to increase application availability. Required Qualifications Bachelor’s degree in Computer Science or related field 5+ years experience in Production Support / Operations environment/ Development 3+ years experience in Java Python Shell scripts 2+ years experience using Docker Kubernetes and Cloud environments 2+ years experience in working in cloud (Azure Preferred) 2+ years of strong Unix Networking and troubleshooting knowledge 3+ years of experience in Agile Lean Agile and/or Scaled Agile methodologies 2+ years of experience in Customer Experience Analytics tool like Quantum Metric CatchPoint Solid understand and experience in Application Performance Monitoring tools like Dynatrace AppDynamics Introscope etc. Experience with visualization tools like Kibana and Grafana. EFK stack experience preferred. Excellent communication and collaboration skills
Description

Our Digital Operations team is looking for a Site Reliability Engineer (SRE) who is passionate about the customer experience and has analytical & multi-tasking abilities to thrive in a fast-paced environment. The SRE is responsible for ensuring that, as new features and applications are introduced to production, essential aspects for reliability such as availability, resiliency, latency, efficiency, change management, monitoring, emergency response, and capacity planning are conducted alongside development of the new features/applications. The SRE will develop automation code & scripts to proactively address customer issues, reduce mean time to repair and improve application availability. The position also includes collaborating closely with feature delivery teams as a bridge between development and operations by applying a software engineering mindset to system administration. This position will split time between operations/on-call duties and guiding the development of systems and software that help increase site reliability and performance to deliver business value. The SRE will need intimate knowledge of the current state of data-center and cloud infrastructure, CI/CD pipeline tools, Kubernetes, Site Reliability Engineering practices, and ability to implements the plan for desired future state. Attention to detail and strong analytical skills are required, along with a “Customer-First” attitude!

By applying to a job using PingJob.com you are agreeing to comply with and be subject to the PingJob.com Terms and Conditions for use of our website. To use our website, you must agree with the Terms and Conditions and both meet and comply with their provisions.

Questions / Comments:

Display Questions / Comments:

No Questions / comments

AT&T Inc Vendors

( Login to see all the 105 vendors)
IBM Corporation
Write a Review

1133 Westchester Avenue
White Plains
New York
www.ibm.com/us ( 130 vendors)

Comsys Information Technology Services Inc
Write a Review

4400 Post Oak Parkway #1800
Houston
Texas
www.comsys.com ( 112 vendors)

Kforce Professional Staffing Firm
Write a Review

1001 E Palm Avenue
Tampa
Florida
www.kforce.com ( 82 vendors)

Job Summary

$90.00 / Hourly

Contract

Plano , Texas, United States

Experience Required : 4 Year/s

Posted : 3 years ago

Deadline : December 9, 2021 3 years ago

Job ID : Job000003736

AT&T Inc

One AT&T Way

www.att.com