Site Reliability Engineer - Azure
Job DescriptionJob Description
**No C2C for this role. Full-time/direct hire opportunity
Key Responsibilities
- Oversee the operation and health of production environments using Azure and IIS.
- Set up and configure monitoring tools like Azure Monitor and Application Insights.
- Incident Response:
- Serve as the first point of contact for production issues, ensuring rapid resolution and root cause analysis.
- Create runbooks and automated processes for common incidents.
- Performance and Scalability:
- Analyze and optimize the performance of IIS applications.
- Plan for capacity needs and ensure systems can handle growth.
- Deployment and Automation:
- Assist with production deployments, ensuring minimal downtime.
- Collaborate with DevOps teams to automate repetitive operational tasks.
- Monitoring and Alerting:
- Design and configure proactive alert notifications to detect and respond to system anomalies.
- Implement multi-channel alerting mechanisms (e.g., email, SMS, webhook).
- Design and implement monitoring strategies using Azure Application Insights.
- Configure application performance telemetry, logging, and diagnostics.
- Develop custom telemetry to meet unique application monitoring needs.
- Create and manage detailed Azure dashboards for real-time performance visualization.
- Customize dashboards to provide actionable insights for stakeholders.
- Documentation and Process Development:
- Develop and maintain operational documentation and best practices.
- Define and refine workflows and processes.
- Integration and Reporting:
- Integrate Application Insights with other Azure services (e.g., Azure Monitor, Log Analytics).
- Generate periodic reports and deliver insights on system performance trends.
Qualifications
Required Skills and Experience:
- Proven experience with Azure Application Insights and Azure Monitor.
- Hands-on expertise in creating custom dashboards and performance reports.
- Strong knowledge of alerting mechanisms and notification setup in Azure.
- Familiarity with telemetry data, application performance monitoring (APM), and diagnostics.
- Experience integrating Azure services (e.g., Log Analytics, Azure Monitor).
- Hands-on experience with IIS (Internet Information Services) for web application hosting.
- Familiarity with monitoring and logging tools.
- Solid understanding of production system workflows and incident management.
Skills:
- Experience with Azure DevOps pipelines and tools.
- Knowledge of scripting (e.g., PowerShell, Python) for automation.
- Familiarity with cloud cost optimization strategies.
Education and Certifications:
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Relevant certifications a plus, such as:
- Microsoft Certified: Azure Administrator Associate.