Our Direct Client is looking to hire a Site Reliability Engineer for a 100% remote role.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that the clients services—both their internally critical and their externally-visible systems—have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our engineering focuses on optimizing existing systems, building infrastructure and eliminating work through automation.
On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to our client, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving, and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
DESCRIPTION OF DUTIES
SUMMARY OF REQUIREMENTS
- Collaborate with software engineers in development, test, as well as CI infrastructure teams with the aim of improving the team’s CI and CD services.
- Partner with development teams by providing infrastructure assistance and guidance from the early phases of product development, including the development of software and processes to assist developers in infrastructure-related workflows (such as build, release and deployment automation).
- Designs and implements infrastructure for new and existing products, ensuring all business policies for security, supportability and cost are met, while enabling efficient deployment of products through automated means.
- Participates in project planning discussions to include the formulation and delivery of cost and labor estimates and options for assigned projects.
- Designs and implements solutions to provide continuous integration, automated deployment, and configuration management of internally or externally developed applications.
- Analyze new and existing products for performance and efficiency improvements, both as part of a structured release process, and as an ongoing process.
- Monitor and tune the performance, reliability, and security of the infrastructure. Identify and correct bottlenecks in the system, while working with development teams on optimization and best practices.
- Integrates internally developed products, externally developed products, and mixtures of both, to create working solutions from multiple disparate parts.
- Experience with cloud services such as RH-Openshift, AWS, Azure, Google and On-prem DCs.
- Strong programming and scripting knowledge, e.g. Groovy, Python, Ruby, PowerShell, Bash, Ansible.
- Build automation and continuous integration tools (Jenkins, Bitbucket, Artifactory, Sonarqube, Bitbucket, Veracode, Jira, Confluence, Terraform..etc)
- Strong knowledge & understanding of tools/skills such as Hashicorp Vault , Apigee , Dynatrace, Kubernetes , Docker, Kafka , Kinesis, Sysdig, Cloud Watch, Cloud trail, Lambda, SQL / Postgres, Info Sec best practices, software security
- Support and improve our tools for continuous integration, automated testing, automation and release management making the entirety of software engineering as efficient and effective as possible.
- Continuous Delivery pipelines implementations
- Monitoring and logging systems (e.g. Splunk, ELK)
- Understand best practices for source control, build engineering, continuous integration and deployment.
- Proficiency in the setup, configuration, maintenance, and upgrading of one or more server operating system families (Linux, Windows, etc)
- Proficiency with server prototyping and virtualization tools.
- Proficiency with version control tools (Bitbucket, Git)
- Experience with SDLC processes (code review, release management, etc) and automation of same (continuous integration, continuous factory delivery)
- Experience with networking protocols (TCP/IP, SSL, etc)
- Soft skills in Tenacity, Communication, Troubleshooting (with real world examples of vexing problems), Tolerance for frustration Experian working across disparate locations / time zones, Proactiveness (seeing something and saying or better yet, doing something
- Bachelor of Science degree in Computer Science, Computer Engineering, Electrical Engineering, Information Technology, Information Systems, Industrial Engineering, or related field; or equivalent combination of education and experience.