Skip to main content

Senior MLOps Engineer

Job DescriptionJob DescriptionBenefits:

  • 401(k)
  • 401(k) matching
  • Competitive salary
  • Paid time off
  • Parental leave



Job Title: Senior MLOps Engineer

About XRI

XRI is an AI company specializing in research, data, and development for low resource . We are dedicated to enabling speakers of low resource to flourish through the development and deployment of advanced technology solutions.

About the role

Were looking for the engineer who will own our entire AI backend stackdesign decisions, uptime, roadmap, and results. Youll inherit a production Kubernetes inference cluster (GPU-backed AKS), a Python/Flask API that handles authentication, Stripe billing, and API-key metering, and a growing ML R&D pipeline for new models. The mandate is broad: keep everything humming today while architecting what it should look like 12 months from now. Well give you budget, support, and plenty of autonomythen count on you to drive impact.

Responsibilities

  • Maintain and scale AI backend (AKS, PyTriton, Flask APIs).
  • Develop APIs with auth, billing, and usage tracking.
  • Enable distributed training on Azure ML with PyTorch DDP/FSDP.
  • Build tools for research experimentation and internal efficiency.
  • Implement CI/CD with GitHub Actions, Docker, and Terraform.
  • Lead observability with Prometheus, Grafana, and SLOs.
  • Drive architecture decisions and roadmap execution.


Qualifications

  • Masters degree in CS, ML, or related field.
  • 5+ years of experience developing backend infrastructure.
  • Proven expertise with Azure Kubernetes Services (AKS).
  • Expert in backend infrastructure and Azure Kubernetes.
  • Demonstrated proficiency in PyTorch, Python, Flask, Stripe integration, and RESTful APIs.


Skills

  • Advanced Python (3.10+), with production experience using Flask or FastAPI.
  • Experience with CI/CD pipelines using GitHub Actions and Docker.
  • Familiarity with monitoring tools (Prometheus, Grafana).
  • Working knowledge of Terraform, API security best practices, OpenAPI specifications.
  • Bonus: Experience with Redis, ReactJS, TypeScript, and Kafka.


Goals and Objectives

  • Short-term: Improve ML systems, set data standards, stabilize backend, and build CI/CD.
  • Long-term: Develop scalable infrastructure, crowd-sourced data systems, and support internal adoption.


Collaboration

  • Reports to Head of ML, works with PMs and broader engineering team.
  • Mentor colleagues and engage in cross-functional teams.


KPIs


  • Inference uptime, cost-efficiency, and latency.
  • Tool adoption, incident resolution, and observability metrics.


& Travel

  • Fluent in English; additional a plus. Willing to travel internationally for field use, offsites, and events.


Culture and Values

  • XRI values innovation, compassion, problem-solving, and flexibility. Ideal candidates work autonomously, think strategically, and support team growth through knowledge-sharing.



Senior MLOps Engineer

Clarksville, TN
Full time

Published on 05/14/2025

Share this job now