Skip to main content

Engineering operations SRE Windows

Job DescriptionJob Description


Job Title: Windows Server Site Reliability Engineer (SRE)

Are you a proactive problem-solver with a passion for optimizing Windows Server environments? We are seeking a skilled Windows Server Site Reliability Engineer (SRE) to join our dynamic team! In this role, you'll be at the forefront of ensuring our Windows Server infrastructure is stable, reliable, and running at peak performance. Youll work closely with development and operations teams to design, implement, and maintain robust infrastructure solutions that support our business objectives.

Key Responsibilities:

  • Infrastructure Management:
  • Design, deploy, and manage Windows Server environments, including critical services such as Active Directory, DNS, and DHCP.
  • Implement and maintain high-availability and disaster recovery solutions to ensure business continuity.
  • Monitor and analyze server performance, capacity, and availability to guarantee optimal operation and minimal downtime.
  • Automation and Scripting:
  • Develop and maintain automation scripts using PowerShell and other scripting like Python to streamline routine tasks.
  • Automate server provisioning, configuration, and deployment processes to improve efficiency and reduce manual effort.
  • Incident and Problem Management:
  • Respond to and resolve escalated incidents by troubleshooting and diagnosing server-related issues swiftly.
  • Identify root causes of recurring problems and implement solutions to prevent future incidents.
  • Performance Optimization:
  • Analyze system performance metrics and identify opportunities for improvement.
  • Implement performance tuning and optimization techniques to enhance server efficiency and reliability.
  • Collaboration and Communication:
  • Collaborate closely with development teams to support application deployments and ensure seamless integration with Windows Server environments.
  • Provide technical guidance and support to team members and stakeholders, fostering a collaborative work environment.
  • Documentation and Reporting:
  • Maintain comprehensive documentation of server configurations, processes, and procedures to ensure knowledge sharing and compliance.
  • Generate regular reports on system performance, incident resolutions, and other key metrics to inform decision-making.

Requirements:

  • Experience:
  • 7-10 years of relevant experience in Windows Server administration or site reliability engineering.
  • Proven track record as a Windows Server Administrator or Engineer, with hands-on experience in managing Windows Server versions 2016, 2019, and 2022.
  • Technical Skills:
  • Strong knowledge of Active Directory, DNS, DHCP, Group Policy, and other essential Windows Server components.
  • Proficiency in PowerShell scripting and automation, with experience in other scripting like Python.
  • Familiarity with monitoring and performance tuning tools to ensure optimal server performance.
  • Problem-Solving:
  • Excellent troubleshooting and diagnostic skills for resolving complex server issues efficiently.
  • Ability to analyze and interpret system logs and performance data to make informed decisions.
  • Collaboration:
  • Strong interpersonal and communication skills, with a proven ability to work effectively in cross-functional teams.
  • Experience collaborating with developers, network engineers, and IT support staff to deliver seamless IT services.
  • Education:
  • Bachelors degree in Computer Science, Information Technology, or a related field, or equivalent work experience.
  • Relevant certifications (e.g., Microsoft Certified: Windows Server, Microsoft Certified: Azure Administrator) are a plus.

Join us and become an integral part of our team, where your skills and expertise will directly contribute to the reliability and success of our Windows Server infrastructure!


Engineering operations SRE Windows

New York, NY
Full time

Published on 05/05/2025

Share this job now