Site Reliability Engineer
The McGraw Hill Companies Inc
Full Time Columbus, Ohio, United States Posted 2 years ago
About Position
Site Reliability Engineer (Full Time)
$0.00 / Hourly
Columbus, Ohio, United States
Site Reliability Engineer
Full Time Columbus, Ohio, United States Posted 2 years ago
Skills
Being able to translate between development operations security product and management dialects is a highly-sought skill. Being “conversational” in JavaScript/TypeScript Python PHP Ruby Golang Java Bash Markdown reStructuredText HCL JSON YAML and TOML would be valuable. Being fluent in 2-3 of them would be a huge plusDescription
3+ years of experience as a software engineer, with practical experience developing, debugging, and deploying enterprise applications
Experience with infrastructure automation technologies - Terraform
Expertise in container/container-fleet-orchestration technologies like ECS or Kubernetes Versatility with troubleshooting diverse sets of hosting technologies: web server platforms, application platforms, operating systems, network components, virtualization technologies, storage, and database platforms.
Expertise with continuous-deployment based software development lifecycles (e.g. CI/CD)
Cloud database operations and deployment experience (RDS MySQL/Postgres/Aurora)
Expertise with Lean/Agile deployment processes (Blue/Green, ZDT, Canary, load balancers/DNS strategies)
Demonstrated expertise building, automating, and managing highly scaled production infrastructure in the cloud
Responsibilities
- Cloud Engineering:
- Hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven, cloud-based services
- Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-as-code, monitoring-as-code)
- Participate in continual learning of the AWS ecosystem, game day scenarios, and professional conferences
- Collaborative solutioning of enterprise applications with development teams utilizing our software stack
- Actively monitor AWS Cost, and utilize optimizer to maximize ROI while maintaining Service Level Objectives
- Observability Engineering:
- Ownership of reliability, uptime, system security, cost, operations, capacity, resiliency and performance-analysis thereof
- Define, monitor and report on service level indicators for applications workloads
- Support on-call rotations for operational duties that have not been addressed with automation, with an eye for correcting issues that result in on-call alarms
- Maintain telemetry that improve the visibility to our applications' performance and business metrics and keep operational workload incheck
- Develop, communicate, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks.
- DevSecOps:
- Support healthy software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery
- Partner with CyberSecurity and develop plans and automation to respond to new risks and vulnerabilities
- Systems Engineering:
- Collaborate with Systems Admins to coordinate middleware, network, storage, database, Windows, Linux, VMware maintenance Automate legacy onprem system maintenance and migrate to cloud via thoughtful redesign
- Resiliency Engineering:
- Collaborate with dev teams to identify failure points and blast radius of systems
- Validate effectiveness of monitoring and observability configurations
- Coordinate failure injection testing
- Observe and document steady state production levels, growth patterns
- Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load
- Coordinate improvements of existing software and infrastructure to meet resiliency goals
By applying to a job using PingJob.com you are agreeing to comply with and be subject to the PingJob.com Terms and Conditions for use of our website. To use our website, you must agree with the Terms and Conditions and both meet and comply with their provisions.