logo

View all jobs

Site Reliability Engineer

Los Angeles, CA
The Direct Client of Atrilogy Solutions Group is looking to hire a Site Reliability Engineer for a 4 months contract position. It's 100% REMOTE.

Name: Site Reliability Engineer

Duration: 4 Months+
Location: Los Angeles, CA ( 100% REMOTE)


Description:
Duties: Design, develop and implement solutions that improve stability, security, scalability and availability of Client's software platforms.
Design mechanisms for alerts and responses to identify and address reliability risks.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning, and reviews Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Design and run performance, capacity and monitoring tests.
Design and build automated processes to identify and resolve issues in the environment.
Lead educational training such as application performance monitoring workshops. Create educational documentation on how-to's and best practices and blog about use-cases and architectures that relate to system reliability.
Liaise with the development teams to provide support of setup, management, and troubleshooting of applications. Perform platform maintenance such as patching and software updates.
Design mechanisms for alerts and responses to identify and address reliability risks.
Develop plans to ensure platforms are updated in a timely and efficient manner.
Identify areas where operational efficiencies can be gained and develop plans to implement those efficiencies.
Perform system administration such as user provisioning and rbac.
Develop and maintain platform architecture and design documents.
Liaise with other groups to ensure platforms are ready to support new and upcoming requirements.
Skills: 5+ years of experience developing applications with an active user base, and deploying to production and going through any change management process (Ideal candidate is able to engage in a detailed discussion about their change management process as well as its happy/pain points)
5+ years of experience in an Operational role, DevOps, SRE, or Software
5+ Engineering years of experience doing development in any of Java,
5+ NodeJS, .NET Core, Python
3+ years of experience with development or administration of application performance monitoring tools such as Dynatrace, AppDynamics, Datadog, New Relic, Application Insights, etc.
Experience with Azure monitoring and Azure App Insights Experience with Monitoring tools such as Dynatrace, LogicMonitor etc.
Experience with Log Monitoring tools such as Splunk, SumoLogic, etc.
Experience with automating manual processes and tests Creativity, energy, and passion for leveraging technology to transform our industry; the belief that automation is the only way A good understanding of modern, cloud centric architectures and DevOps principles Experience with the operational aspects of software systems such as monitoring, centralized logging, and alerting Providing standardized offerings to facilitate and ensure operational health of stacks throughout their lifecycle including metrics collection, aggregation, and visualization, inventory, capacity, and billing/tag management

Keywords:

SITE RELIABILITY ENGINEER
RELIABILITY ENGINEER
APPLICATION PERFORMANCE
CHANGE MANAGEMENT

Additional Skills:
.NET
COLLECTION
DEVOPS
ENGINEER
INVENTORY
JAVA
METRICS
PYTHON
SOFTWARE ENGINEERING
VISUALIZATION
ARCHITECTURE
.NET CORE
B2B SOFTWARE
DEV OPS
DOCUMENTATION
GO LIVE
MAINTENANCE
MS .NET
PROVISIONING
SPLUNK
SYSTEM ADMINISTRATION
SYSTEMS ADMINISTRATION
TRAINING

Minimum Degree Required: Bachelor's Degree
 
Powered by