Skip to main content

Senior Kubernetes Engineer in Dallas

Job DescriptionJob Description

Senior Kubernetes Engineer
Location: Dallas, TX (Hybrid – 3/2) | Relocation available
Type: Direct Hire

• Competitive base salary + performance bonus
• 100% company-paid benefits

Overview
We are seeking a Senior Kubernetes Engineer to help design and scale a next- GPU-accelerated compute platform supporting AI, machine learning, and high-performance computing workloads. This role sits at the core of a rapidly expanding infrastructure environment, focused on building high-throughput, highly efficient container platforms across on-prem and hybrid environments.

You will play a key role in architecting and operating large-scale Kubernetes clusters optimized for GPU workloads, working closely with platform, HPC, and ML engineering teams to deliver reliable, multi-tenant compute at scale. This is a hands-on engineering role with strong ownership across performance, automation, and platform evolution.

Key Responsibilities

Kubernetes Platform Engineering
• Design, deploy, and operate large-scale Kubernetes clusters optimized for GPU-intensive workloads
• Architect container platforms supporting AI/ML, LLM training, and HPC use cases
• Extend Kubernetes through custom operators, controllers, and CRDs to support infrastructure automation

GPU & Workload Optimization
• Integrate and optimize NVIDIA ecosystem components, including GPU Operator, DCGM, and device plugins
• Implement GPU scheduling strategies, including MIG, sharing, and workload placement optimization
• Enhance cluster efficiency using scheduler extensions such as kube-scheduler plugins, Slurm, or Volcano

Platform Performance & Reliability
• Drive performance tuning across compute, networking, and storage layers for high-throughput workloads
• Partner with HPC and ML teams to ensure scalability, reliability, and workload efficiency
• Participate in production readiness, incident response, and continuous improvement initiatives

Observability & Automation
• Implement monitoring and telemetry solutions using Prometheus, Grafana, DCGM Exporter, and OpenTelemetry
• Build and maintain CI/CD pipelines for infrastructure using GitOps tools such as ArgoCD and FluxCD
• Contribute to infrastructure-as-code using Terraform, Helm, and Kustomize

Security & Multi-Tenancy
• Design and enforce secure multi-tenant environments with namespace isolation, RBAC, and policy controls
• Implement governance frameworks using tools such as OPA or Gatekeeper
• Ensure compliance with platform security and operational standards

Required Experience
• Strong experience operating Kubernetes in large-scale, production environments
• Hands-on experience with NVIDIA GPU ecosystem, including GPU Operator, device plugins, MIG, and DCGM
• Proficiency in Go or Python for building Kubernetes operators and automation tooling
• Deep understanding of Kubernetes internals, including CRDs, controllers, RBAC, and scheduling
• Experience supporting GPU-intensive workloads such as AI/ML training, LLMs, or scientific computing
• Experience with GitOps, CI/CD pipelines, and infrastructure-as-code practices
• Familiarity with container networking, including CNI plugins such as NVIDIA CNI or Multus
• Experience with monitoring and observability tools for cluster and GPU performance

This is a high-impact opportunity to work at the forefront of AI infrastructure, helping build and scale the platforms that power next- compute.

Company DescriptionGTN is the leader in SOW management & technical staffing, leveraging innovation to drive next- recruiting to Fortune 200 companies.Company DescriptionGTN is the leader in SOW management & technical staffing, leveraging innovation to drive next- recruiting to Fortune 200 companies.

Senior Kubernetes Engineer in Dallas

Dallas, TX
Full time

Published on 05/06/2026

Share this job now