SITE RELIABILITY ENGINEER (SRE)

xAI
Full-time
Palo Alto, CA
$180,000 - $440,000
Posted on 6 months ago

Job Description

The Site Reliability Engineer (SRE) will work on the team responsible for the backend services that power grok.com and the API, focusing on writing highly scalable and reliable services hosted on Kubernetes clusters. The role requires expertise in Kubernetes, continuous deployment systems, monitoring technologies, and infrastructure as code.

Responsibilities

  • Work on backend services for grok.com and API
  • Write highly scalable and reliable services
  • Process tens of thousands of queries per second
  • Manage services hosted on Kubernetes clusters

Requirements

  • Expert knowledge of Kubernetes
  • Expert knowledge of continuous deployment systems (Buildkite, ArgoCD)
  • Expert knowledge of monitoring technologies (Prometheus, Grafana, PagerDuty)
  • Expert knowledge of infrastructure as code technologies (Pulumi, Terraform)

Benefits

  • No benefits