All jobs
Wikimedia FoundationEngineering
Senior Site Reliability Engineer, Infrastructure Foundations
Remote (US)$113k–$175kPosted today
The Wikimedia Foundation is seeking a Senior Site Reliability Engineer to support and develop the platform for Wikipedia, ensuring its reliability and scalability. The role involves working with a globally distributed, open-source, and remote-first team to maintain Wikimedia’s top-10 website.
Location: Remote (US)
Salary: $113k–$175k
Responsibilities
- Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure (deployment, maintenance, configuration, troubleshooting)
- Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes)
- Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform
- Work closely with product teams helping them bring scalable functionality to our users by assisting in the architectural design of new services and making them operate at scale
- Participating in a 24/7 on-call rotation shared across the broader SRE team. This includes taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure.
- Collaborating with a global, cross-functional team in an asynchronous communication environment
- Mentoring peers in your areas of technical and operational strength
- Ability and willingness to travel 1-2 times a year for in-person events and team meetings
Requirements
- 6+ years of experience in an SRE/Operations/DevOps role as part of a team
- Experience with shell and any scripting languages used in an SRE context (Python, Go, Bash, Ruby; we primarily use Python) and configuration management tools (Puppet, Ansible; we use Puppet)
- Experience designing and managing infrastructure security for large fleets of diverse services
- Experience with technical response during security incidents
- Experience with package management on Linux systems (we use Debian)
- Strong Linux system-level troubleshooting skills
- History of automating tasks and processes, identifying process gaps, and finding automation opportunities
- Strong English language skills (verbal and written) and ability to work independently, as an effective part of a globally distributed team working across multiple time zones
- Experience leading and participating in incident response and post-incident review rituals, with the goal of conducting root cause analysis and implementing preventive measures
Location
Remote (US)
Salary
$113k–$175k
Category
EngineeringCompany
Wikimedia FoundationSource
himalayas
Posted
today
Similar remote jobs
Senior Construction Engineer
Multiple states in the US (see salary ranges by state)$84,629 - $124,122 (location-based pay ranges)
today