Staff ML Infrastructure Engineer in Santa Clara
Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.
We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.
Job Description
Staff / Lead ML Infrastructure Engineer
San Francisco, CA — Onsite
Salary - Over market average + equity
We are building one of the world’s leading generative video and multimodal AI platforms, and we’re looking for a senior infrastructure engineer to drive the backbone that makes it possible. This role is ideal for an engineer from a top-tier tech company who has built cloud-scale systems, high-performance compute platforms, and battle-tested CI/CD pipelines that support complex ML workloads.
What You’ll Own
- Core ML Platform Architecture: Design and evolve the infrastructure that supports large-scale generative video and multimodal model training, evaluation, and deployment.
- High-Throughput Compute Systems: Build and optimize GPU/TPU clusters, distributed training systems, and orchestration layers tailored for video-heavy pipelines.
- Production Reliability for Generative Models: Create the tooling and services needed to safely push frequent model updates while handling massive compute loads and long-running jobs.
- End-to-End CI/CD for ML: Lead the development of automated pipelines for model training, validation, artifact management, and production rollout.
- Multimodal Data Infrastructure: Build systems to ingest, version, transform, and serve large-scale video, audio, and text datasets with high reliability.
- Internal Developer Experience: Partner with research, product, and applied ML teams to build intuitive internal tooling for experiment tracking, model lineage, and resource scheduling.
- Technical Leadership: Mentor engineers, set platform standards, and influence long-term architectural direction.
What You’ve Done
- Experience architecting and operating large-scale infrastructure at a cloud provider, hyperscaler, or leading AI company.
- Built or owned mission-critical CI/CD systems, high-capacity compute platforms, or data infrastructure supporting ML teams.
- Deep experience with distributed compute across GPUs/accelerators, Kubernetes, and cloud infrastructure (AWS/GCP/Azure).
- Strong engineering fundamentals in Python, Go, or equivalent .
- Previous exposure to ML training pipelines—especially systems that handle heavy video, multimodal, or high-dimensional data.
- Demonstrated ability to lead complex cross-org initiatives and drive technical strategy.
Nice to Have
- Experience with video processing systems, large-scale media pipelines, or streaming architectures.
- Familiarity with modern multimodal or video- frameworks (PyTorch, JAX, diffusers, custom accelerators).
- Experience with Ray, Triton, CUDA optimization, or specialized scheduling for ML workloads.
- Background working in high-growth AI startups or research-focused environments.
- Security and compliance considerations for models that generate or process user content.
Why Join
- Shape the underlying platform powering one of the most advanced generative video systems in the world.
- Influence the future of multimodal AI by building infrastructure that directly accelerates research and product breakthroughs.
- Work closely with experienced founding engineers, researchers, and platform builders from leading tech companies.
- Highly competitive compensation, meaningful equity, and strong in-person engineering culture in San Francisco.
If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.