All jobs
PfizerEngineering
Staff Platform Engineer, AI/ML Infrastructure
Remote€65,250–€108,750Posted today
The Staff Platform Engineer, AI/ML Infrastructure provides technical leadership for cloud platforms, deployment systems, and operational foundations supporting enterprise-scale generative AI applications. The role involves defining and evolving infrastructure architecture across AWS, Kubernetes, serverless, and containerized environments, leading platform standards for reliability, scalability, observability, CI/CD, security, and developer enablement.
Location: Remote
Salary: €65,250–€108,750
Responsibilities
- Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AI applications, LLM integrations, model routing, and enterprise AI services.
- Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECS Fargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA.
- Establish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to support reliable multi-environment and multi-region deployments.
- Lead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWS authentication, automated quality gates, deployment promotion, and environment approvals.
- Design and improve observability across AI platforms, including CloudWatch dashboards, logs, alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics.
- Build platform capabilities for GenAI workloads, including model availability monitoring.
- Partner with software engineering teams to improve deployment reliability, rollback strategies, health checks, autoscaling, load testing, and runtime performance.
- Define and enforce security and compliance practices for infrastructure, including IAM permission boundaries, Secrets Manager usage, secret scanning, audit logging, tagging standards, and change-management controls.
- Provide technical leadership for cost optimization, capacity planning, environment standardization, and operational resilience across development, test, production, and sandbox environments.
- Mentor engineers, review architecture and infrastructure designs, and influence platform engineering practices across teams.
Requirements
- Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field, or equivalent practical experience.
- 7+ years of experience in DevOps, platform engineering, cloud infrastructure, site reliability engineering, or software engineering roles.
- Strong hands-on experience with AWS/Azure/GCP infrastructure and services, including container, serverless, networking, storage, observability, and security services.
- Experience designing and operating production systems on Kubernetes, ECS/Fargate, or comparable container orchestration platforms.
- Proficiency with infrastructure-as-code, especially CloudFormation, Terraform, Helm, or similar tooling.
- Strong CI/CD experience with GitHub Actions or similar platforms, including reusable workflows, automated testing, deployment gates, and cloud authentication.
- Experience building and operating observability solutions using CloudWatch, Prometheus/Grafana, OpenSearch, or similar tools.
- Strong understanding of cloud security practices, IAM, secrets management, least-privilege access, audit logging, and compliance requirements.
- Experience supporting distributed systems, microservices, APIs, asynchronous workloads, and multi-environment deployments.
- Demonstrated ability to lead technical design, mentor engineers, and influence engineering practices across teams.
Similar remote jobs
Lead Power Systems Engineer - Grid Integration and Stability, Consulting Service
Remote (US)$89,300–$148,700/yr
today