Skip to main content

SRE Architect

Job DescriptionJob Description

Site Reliability Engineer

100% remote

  • Good understanding of SRE principles
  • Supporting Kafka from an SRE perspective (e.g., tuning, fault tolerance, operational readiness).
  • Has created SLO-based alerts or dashboards, which are essential for observability in this role.
  • experience in building infrastructure-level dashboards that provide meaningful insights into system health and performance.
  • strong troubleshooting skills and/or a structured approach to diagnosing and resolving complex production issues.
  • understanding of fault-tolerant system design, especially in the context of Kafka.
  • work with the SRE team to educate them on best practices, best path forward, etc.

 

Client is going through a modernization effort for their equity trading platform.  They have asked us to bring in an a senior SRE who ideally has experience supporting large scale deployments. 

 

Desired Qualifications

  • 10+ years' experience in information technology and/or professional services, with emphasis on subject matter expertise.
  • At least 4 years of experience as a Site Reliability Engineering or equivalent role.
  • Strong track record of delivering projects of demonstrable complexity and scale.
  • Experience with data visualization and monitoring tools such as Splunk, Grafana, Dynatrace, Datadog, New Relic, Oracle Enterprise Manager, etc.
  • Experience with telemetry frameworks and tools: (including but not limited to) Open Telemetry, Prometheus, Loki, Tempo, Fluent, Jaeger, etc.
  • Prior success in automating real-world production environments using Chef, Puppet, Salt, Ansible, or cloud- equivalents.
  • Ability to lead adoption and mentor team members on modern Site Reliability Engineering and architectural concepts.
  • Proficient in designing and building highly available, resilient large-scale distributed systems.
  • Demonstrated leadership abilities in an engineering environment in driving operational excellence and best practices.
  • Demonstrated ability to achieve stretch goals in a highly innovative and fast-paced environment.
  • Subject matter expertise with the following Cloud Service Providers: Amazon Web Services (AWS), Microsoft Azure (Experience with GCP plus).
  • Expertise in Site Reliability Engineering technologies including but not limited to CI/CD frameworks, Infrastructure as Code, and monitoring and logging frameworks.
  • Familiarity with container technologies like Docker, Kubernetes, and cloud- frameworks.
  • Excellence in technical communication with peers and non-technical cohorts.
  • Sharp analytical abilities and proven design skills.
  • Strong sense of ownership, urgency, and drive.

Company Descriptionplease visit our site nobletechies.com.Company Descriptionplease visit our site nobletechies.com.

SRE Architect

Myrtle Point, OR 97458
Full time

Published on 07/09/2025

Share this job now