All jobs
C the SignsDevOps
Senior MLOps Engineer
United StatesPosted today
We’re hiring a Senior MLOps Engineer with deep machine learning engineering experience to build and operate the production platform powering ML/LLM-driven healthcare workflows. You’ll design reliable, secure, and compliant systems for model development, evaluation, deployment, monitoring, and continuous improvement—working closely with ML, data, security, and product teams.
Location: United States
Responsibilities
- Design and operate ML platforms that support end-to-end workflows: data ingestion, feature engineering, training, evaluation, deployment, and monitoring.
- Build and maintain CI/CD for ML (testing, packaging, versioning, reproducibility, automated rollbacks, approvals).
- Implement MLOps best practices: model registry, experiment tracking, lineage, governance, and reproducible training environments.
- Develop scalable training infrastructure (distributed training, GPU scheduling, cost controls, auto-scaling).
- Create and maintain feature pipelines / feature stores, ensuring consistency between training and inference (training-serving skew prevention).
- Establish model monitoring and observability: performance, drift, bias/fairness signals (where relevant), latency, throughput, and data quality.
- Build and own end-to-end LLM delivery pipelines: prompt/versioning, retrieval, orchestration, evaluation, deployment, monitoring, and iterative improvement.
- Create robust LLM evaluation harnesses (offline + online): golden datasets, automated regression testing, human-in-the-loop review workflows, and risk scoring.
- Build cost controls: token/cost budgeting, caching strategies, autoscaling, and performance tuning.
Requirements
- 6+ years in software/platform engineering, including 4+ years operating ML systems in production (or equivalent depth).
- Strong experience in ML engineering: training pipelines, evaluation, deployment patterns, monitoring, and iteration loops.
- Strong engineering skills in Python, plus production-grade experience building APIs/services.
- Demonstrated hands-on experience with LLM systems in production and ML engineering: training pipelines, evaluation, deployment patterns, monitoring, and iteration loops.
- Strong experience with GCP services and cloud-native patterns.
- Experience with Vertex AI (pipelines, endpoints, feature store, model registry, evaluation) and/or managed vector search on GCP.
- Experience with containerization and orchestration (Docker, Kubernetes/GKE and/or Cloud Run).
Benefits
- Competitive salary and benefits package.
- Flexible working arrangements (remote or hybrid options available).
- The opportunity to work on life-changing AI technology that directly impacts patient outcomes.
- Join a team that combines cutting-edge innovation with a mission to save lives and improve health equity.
- Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.