Skip to main content

Site Reliability Engineer - Azure

Job DescriptionJob Description

**No C2C for this role. Full-time/direct hire opportunity

 

 

Key Responsibilities

  • Oversee the operation and health of production environments using Azure and IIS.
  • Set up and configure monitoring tools like Azure Monitor and Application Insights.
  • Incident Response:
  • Serve as the first point of contact for production issues, ensuring rapid resolution and root cause analysis.
  • Create runbooks and automated processes for common incidents.
  • Performance and Scalability:
  • Analyze and optimize the performance of IIS applications.
  • Plan for capacity needs and ensure systems can handle growth.
  • Deployment and Automation:
  • Assist with production deployments, ensuring minimal downtime.
  • Collaborate with DevOps teams to automate repetitive operational tasks.
  • Monitoring and Alerting:
  • Design and configure proactive alert notifications to detect and respond to system anomalies.
  • Implement multi-channel alerting mechanisms (e.g., email, SMS, webhook).
  • Design and implement monitoring strategies using Azure Application Insights.
  • Configure application performance telemetry, logging, and diagnostics.
  • Develop custom telemetry to meet unique application monitoring needs.
  • Create and manage detailed Azure dashboards for real-time performance visualization.
  • Customize dashboards to provide actionable insights for stakeholders.
  • Documentation and Process Development:
  • Develop and maintain operational documentation and best practices.
  • Define and refine workflows and processes.
  • Integration and Reporting:
  • Integrate Application Insights with other Azure services (e.g., Azure Monitor, Log Analytics).
  • Generate periodic reports and deliver insights on system performance trends.

 

 

Qualifications

Required Skills and Experience:

  • Proven experience with Azure Application Insights and Azure Monitor.
  • Hands-on expertise in creating custom dashboards and performance reports.
  • Strong knowledge of alerting mechanisms and notification setup in Azure.
  • Familiarity with telemetry data, application performance monitoring (APM), and diagnostics.
  • Experience integrating Azure services (e.g., Log Analytics, Azure Monitor).
  • Hands-on experience with IIS (Internet Information Services) for web application hosting.
  • Familiarity with monitoring and logging tools.
  • Solid understanding of production system workflows and incident management.

Skills:

  • Experience with Azure DevOps pipelines and tools.
  • Knowledge of scripting (e.g., PowerShell, Python) for automation.
  • Familiarity with cloud cost optimization strategies.

Education and Certifications:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Relevant certifications a plus, such as:
  • Microsoft Certified: Azure Administrator Associate.

Site Reliability Engineer - Azure

Charlotte, NC
Full time

Published on 07/02/2025

Share this job now