workfromanywhereworkfromanywhere
All jobs
QADDevOps

Senior Site Reliability Engineer

RemotePosted today

Senior Site Reliability Engineer responsible for ensuring the reliability, scalability, and performance of mission-critical services, driving automation, and shaping SRE practices.

Location: Remote

Responsibilities

  • Design, implement, and maintain highly available, scalable, and resilient systems.
  • Define, implement, and enforce best practices for monitoring, alerting, logging, tracing, and synthetic testing within AWS using Datadog.
  • Develop robust, well-tested software and tooling for automation and reliability.
  • Contribute to incident management, post-mortems, and reliability metrics.
  • Leverage infrastructure as code with Terraform and GitHub Actions.
  • Provide expertise in system design reviews, architecture, and scalability.
  • Share knowledge through documentation, runbooks, and mentorship.

Requirements

  • Experience operating and improving production systems at scale.
  • Ability to understand complex distributed systems.
  • Troubleshooting skills and incident response experience.
  • Experience with SLIs, SLOs, and error budgets.
  • Strong communication skills.

Additional Information

  • This role is fully remote.
  • The role involves working with AWS, Kubernetes, Datadog, Terraform, and other observability and automation tools.
  • The company values diversity, equity, and inclusion.

Location

Remote

Category

DevOps

Company

QAD

Source

himalayas

Posted

today

Share this job

XLinkedIn

Similar remote jobs

today
PfizerNewEngineering

Staff Platform Engineer, AI/ML Infrastructure

Remote€65,250–€108,750
today
Creative FabricaEngineering

Senior Backend Engineer Studio AI

Remote
yesterday

Scrum Master

Dallas, Tx (Remote)
yesterday
AmentumEngineering

Launch And Recovery Technician I

Andros Island, Bahamas
2d ago