Skip to main content

Site Reliability Engineer

Job DescriptionJob Description

W2 Only

Site Reliability Engineer

Responsibilities:

• Work with teams to understand the standards of Product Development and recommend changes towards increased stability of the products and applications.

• Building software to improve DevOps, ITOps, and support processes which support the “everything” as code model such as “Infrastructure as code”, Platform as a service, etc.

• Perform safe reliable deployments of all appropriate software artifacts into various systems from Development, Staging to Production.

• Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

• Create / Maintain plan for disaster recovery in the staging and production environments

• Analyze system problems including root cause determination and manage any needed recovery process to ensure a quick restoration of service without loss of data.

• Maintains a broad knowledge of state-of-the-art technology, equipment, and/or systems

• Able to understand RESTful services, even using APIs to help towards automation goals

• Maintain network and system security, understand security protocols, certificate management

Experience/Skills:

• Experience working under a Scrum methodology

• Ability to analyze and resolve problems in systems, networks, software, and APIs; understanding where all sources of information can come from.

• Understanding of source/version control such as GIT or BitBucket.

• DevOps processes and tools such as Azure DevOps or Jenkins

• Involvement with containerization, such as Docker or Kubernetes

• CI/CD implementation expertise

• Good knowledge on cloud applications (preferable AWS).

• Experience with IT automation in general. Using tools like Ansible, coding with programming like Python, Groovy, PowerShell or Bash scripts

• Windows and Linux OS knowledge .

• Use of monitoring and logging tools such as Splunk, Dynatrace or similar

• Advanced English proficiency

• Understanding Microsoft suite of development tools is a plus, including Visual Studio, IIS, MS SQL Server, .NET





Site Reliability Engineer

Florham Park, NJ
Full time

Published on 07/06/2025

Share this job now