workfromanywhereworkfromanywhere
All jobs
DevsuEngineering

Site Reliability Engineer (SRE) - GCP

Remote (US)Posted 19 days ago

Seeking a Site Reliability Engineer (SRE) with expertise in monitoring, observability, and reliability engineering to support systems on-premises and Google Cloud Platform (GCP). The role involves designing, operating, and improving monitoring and observability platforms, with secondary backup support for application issues.

Location: Remote (US)

Responsibilities

  • Own and operate the monitoring and observability stack across on-prem and GCP environments
  • Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications
  • Define, tune, and maintain alerts to ensure high signal-to-noise ratio
  • Establish observability standards and best practices across teams
  • Improve visibility into system health, performance, and reliability
  • Apply SRE principles to improve availability, performance, and resilience
  • Define and track SLIs, SLOs, and error budgets
  • Participate in on-call rotations and SEV incident response
  • Lead or contribute to incident investigations and root cause analysis (RCA)
  • Drive preventative actions to reduce repeat incidents
  • Support and monitor Kubernetes environments (GKE and on-prem clusters)
  • Monitor cluster health, capacity, and resource utilization
  • Troubleshoot platform-level issues impacting application reliability
  • Collaborate with Platform and Engineering teams on reliability improvements
  • Provide L2/L3 application support during resource shortages, high-severity incidents, and peak periods
  • Triage and troubleshoot application issues using runbooks and dashboards
  • Collaborate with Application Support and Engineering teams during incidents
  • Document actions, findings, and resolutions in ServiceNow

Requirements

  • Strong experience as a Site Reliability Engineer or Reliability Engineer
  • Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting)
  • Solid experience with monitoring and observability systems
  • Production experience operating Kubernetes environments
  • Experience supporting systems in GCP and on-prem environments (mandatory)
  • Strong Linux systems and troubleshooting skills
  • Fluent English (written and spoken)
  • Ability to work in PST time zone
  • Ability to participate in an on-call rotation including weekend coverage

Benefits

  • Stable, long-term contract with career growth opportunities
  • Private health insurance
  • Remote-friendly culture promoting work-life balance
  • Continuous training, mentorship, and learning programs
  • Free access to AI training resources and tools
  • Flexible Paid Time Off (PTO) policy and paid holidays
  • Challenging software projects for clients in the US and LatAm
  • Collaboration with talented software engineers in Latin America and the US

Location

Remote (US)

Category

Engineering

Company

Devsu

Source

himalayas

Posted

19 days ago

Share this job

XLinkedIn

Similar remote jobs

DiversifiedNewEngineering

Senior Design Engineer - Electronic Security

$122,600 – $165,900
today
CanonicalNewEngineering

Security Software Engineer

Worldwide
today
Crawford & CompanyNewEngineering

Technical Engineer I

Remote – Anywhere in the U.S.
today