logo

View all jobs

Site Reliability Engineer ( SRE )- AZURE - 100% REMOTE

Los Angeles, CA

Our premier financial client is seeking a Site Reliability Engineer (SRE) to join their team as a 100% remote Full-time/ Direct-Hire employee and work per the Pacific Time.
Team is in downtown Los Angeles CA 90071. Can work 100% Remote per Pacific Time but if person is more junior might need to come into Los Angeles office 2-3 days/ week.
 
Important:

  • Would like someone who has a development background with good DevOps. 
  • We’ll take someone with on-prem experience who is starting to learn off-prem (azure).  The reason for this is that we have more on-prem work right now so we need the on-prem experience.
  • Financial services preferred but NOT required
  • Microsoft background, .NET preferred for development and general monitoring tools (Dynatrace, Splunk—not requirements on tools but preferred).
  • Familiar with development.
  • SRE will work closely with DevOps team.
  • Building in place monitoring and building automated process for manual processes right now.
  • Azure is a big plus ( client will get them certified too).
  • Past experience doing development and support.
  • Sys admin does something 10 times and this person comes in and figures out a way to automate it, come up with a solution.
  • Financial services experience preferred but NOT required.
  • Support and DevOps teams are currently mixed.
Job Description
As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of the Cloud Platform.
 
Responsibilities
Design, develop and implement solutions that improve stability, security, scalability and availability of the software platforms.
Design mechanisms for alerts and responses to identify and address reliability risks.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning, and reviews
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Design and run performance, capacity and monitoring tests.
Create educational material such as cloud native sample apps and starter code, as well as contribute to holding cloud native educational events like hackathons and live coding sessions. Create educational documentation on how-tos and best practices, and blog about use-cases and architectures that relate to cloud platforms
Liaise with the team managing our public cloud environments, including setup, management, and troubleshooting
Design, develop and implement solutions that improve stability, security, scalability and availability of the software platforms.
Design mechanisms for alerts and responses to identify and address reliability risks.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning, and reviews
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Design and run performance, capacity and monitoring tests.
Create educational material such as cloud native sample apps and starter code, as well as contribute to holding cloud native educational events like hackathons and live coding sessions. Create educational documentation on how-tos best practices, and blog about use-cases and architectures that relate to cloud platforms
Liaise with the team managing our public cloud environments, including setup, management, and troubleshooting
 
Required
  • 5+ years of experience in an Operational role, DevOps, SRE, or Software Engineering
  • 5+ years of experience doing development in any of Java, NodeJS, .NET Core, Python
  • 3+ years of experience with development or administration on any cloud platforms (Cloud Foundry, Heroku, AWS, Azure, Google Cloud, IBM Cloud, Bluemix, Kubernetes, and others). (The ideal candidate has significant experience with Platform as a Service cloud such as Cloud Foundry)
 
Additional Skills and Knowledge
  • 5+ years of experience developing applications with an active user base, and deploying to production and going through any change management process (Ideal candidate is able to engage in a detailed discussion about their change management process as well as its happy/pain points)
  • Experience with Splunk / Elasticsearch and Kibana
  • Experience with Monitoring tools such as Datadog, Dynatrace etc.
  • Experience with automating manual processes and tests
  • Creativity, energy, and passion for leveraging technology to transform our industry; the belief that automation is the only way
  • A good understanding of modern, cloud centric architectures and DevOps principles
  • Experience with the operational aspects of software systems such as monitoring, centralized logging, and alerting
  • Providing standardized offerings to facilitate and ensure operational health of stacks throughout their lifecycle including metrics collection, aggregation, and visualization, inventory, capacity, and billing/tag management
  • Above average performance. You are competitive and passionate. You thrive on challenge and have a proven ability to set ambitious but achievable goals and surpass them
 
For immediate consideration please submit your resume in Word format, along with daytime contact information.   Client is unable to provide H-1B Visa sponsorship at this time. All submittals will be treated confidentially.  Pursuant to the Fair Chance for Hiring Initiative Ordinance, we will consider for employment qualified applicants with arrest and conviction records.  Principals only, no third parties please.
 

 
Atrilogy Solutions Group, Inc. (est. 2000), in partnership with Peak17 Consulting (est. 2008), provides organizations of all sizes with high-quality, cost effective information technology (IT) staffing services. 
 
Atrilogy has been recognized by Inc. magazine as one of the nation’s fastest-growing, privately held companies. Headquartered in Irvine, California, Atrilogy also has offices in Denver, Phoenix, & Atlanta with satellite offices in Boston, Jersey City, Las Vegas, and Delhi, India.
 
Clients turn to Atrilogy for expertise in:
 
  • IT staffing and placement such as Project Managers, Agile/Scrum Masters, Business Analysts, DBAs, Software Engineers, Mobile Developers (iOS, Android), DevOps, Automation, QA, Systems & Network Engineers, Cyber Security / Information Security Specialists, ERP, CRM, Business Intelligence, Data Warehousing, Big Data and Creative (UI/UX, Web Design)
 
 Clients turn to Peak17 for expertise in:
 
  • Operational staffing and placement of Accounting/Finance, Human Resources, and Marketing professionals, as well as Information Technology resources.

 
Atrilogy Solutions Group and Peak17 Consulting are Equal Opportunity Employers. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, gender expression, national origin, protected veteran status, or any other basis protected by applicable law, and will not be discriminated against on the basis of disability.
 
In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire.

Powered by