Site Reliability Engineer in Holmdel
Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.
We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.
Job Description
Sr Engineer Observability
Holmdel, NJ or Bethlehem, PA
Job Description:
Responsibilities:
ou are:
We are seeking a dedicated and detail oriented Senior Observability Engineer with expertise in Splunk, App Dynamics, Open Telemetry and Zenoss to join our Enterprise Observability Engineering team. The ideal?candidate will be responsible for the administration,?configuration, and maintenance of our observability tools to ensure optimal performance and reliability of our IT systems.
You have:
- Bachelor’s degree in?Computer Science, Information Technology, or a related field.
- Minimum of 5–7 years in Observability/Monitoring/Site reliability engineering with a focus on Splunk, AppDynamics and Zenoss.
- Proven experience in Implementing, Managing and Maintaining observability tools.
Technical Skills:
- Proficiency in Splunk and AppDynamics (including configuration, administration, and implementation).
- Proficiency in Zenoss (including setup, configuration, and maintenance).
- Strong in MELT, Metrics, Events, Logs and Traces; hands-on troubleshooting and support
- Open Telemetry: instrumentation patterns, context propagation, collectors, sampling etc
- Maintain platform reliability, upgrades, patching, and security hardening
- Exposure to Kubernetes observability (cluster/workload metrics, events, service discovery)
- Strong knowledge of IT infrastructure, applications, and networking.
- Experience with scripting and automation tools (e.g., Python, Bash).
- Familiarity with cloud environments (e.g., AWS, Azure) is required.
Soft Skills:
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities.
- Ability to work independently and in a team-oriented environment.
Qualifications:
- Experience with other monitoring and observability tools (e.g., Prometheus, Grafana).
- Knowledge of DevOps practices and CI/CD pipelines.
- Hands-on Infrastructure-as-Code (Terraform/Ansible) and Git-based workflows
Key Responsibilities:
1. Administration and Implementation:
- Administer and configure Splunk, AppDynamics, OTEL and Zenoss platforms to meet organizational monitoring needs.
- Perform regular updates, patches, and upgrades to observability tools to ensure they are up-to-date and secure.
2. Monitoring and Maintenance:
-?Continuously monitor the health and performance of the Splunk, APPD and Zenoss systems.
- Ensure data integrity and availability within the observability platforms.
3. User Support and Training:
- Provide support to internal users, assisting with troubleshooting and resolving issues.
- Develop and deliver training sessions for users to effectively utilize the monitoring tools.
4. Dashboard and Alert Management:
-?Create and manage dashboards, reports, and alerts
- Work with stakeholders to define monitoring requirements and implement appropriate alerting mechanisms.
5. Data Management and Optimization:
- Manage the onboarding, Alert creation.
- Optimize system performance by tuning configurations and managing resource utilization.
6. Documentation and Best Practices:
- Maintain comprehensive documentation of configurations, processes, and procedures.
- Develop and enforce best practices for monitoring and observability within the organization.
7.?Collaboration and Incident Response:
-?Collaborate with IT and DevOps teams to ensure comprehensive monitoring coverage.
- Participate in incident response efforts, using observability data to assist in troubleshooting and resolution.
Location: This is a hybrid role based in either our Holmdel, NJ or Bethlehem, PA office. We will only consider local candidates for this position. Please include the candidate's current location on the resume.
Work Authorization: This is a contract to hire role. Applicants must be legally authorized to work in the United States now and in the future without the need for employer sponsorship. Only candidates authorized to work in the U.S. on a permanent basis will be considered.
Thanks & Regards,
Navin Singh
Recruiter – Talent Acquisition Group
Phone: 973-370-7023
Address: 200 Metroplex Drive, Suite 300, Edison, NJ 08817
Mail: Navin.singh@irissoftware.com
If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.